Announcement

Collapse
No announcement yet.

OpenMPI 5.0 Ready To Say "Goodbye" To 32-Bit Support

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • OpenMPI 5.0 Ready To Say "Goodbye" To 32-Bit Support

    Phoronix: OpenMPI 5.0 Ready To Say "Goodbye" To 32-Bit Support

    The OpenMPI message passing interface library is ready to completely abandon 32-bit software support with its forthcoming v5.0 release...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    HPC is one of the earliest communities that very quickly jumped on the x86-64 bandwagon as soon as HW was available (of course helped by the fact that the new 64-bit Opterons were at the time very competitive against Intel's 32-bit offerings), so this is hardly surprising.

    I guess one (marginal) usecase where 32-bit might still be useful is making toy clusters with Raspberry Pi's for educational purposes. But even there, the HW is 64-bit since quite a while and 64-bit distros exists, though IIRC the out of the box Raspbian is still 32-bit.

    Comment


    • #3
      Makes sense.... Anyone with 32bit hardware, can use the previous version... It still works... That is how opensource upstream should think. Old versions do not disappear from existence, and old hardware can use them.

      Comment


      • #4
        Originally posted by jabl View Post
        I guess one (marginal) usecase where 32-bit might still be useful is making toy clusters with Raspberry Pi's for educational purposes. But even there, the HW is 64-bit since quite a while and 64-bit distros exists, though IIRC the out of the box Raspbian is still 32-bit.
        Toy clusters? 256+ nodes of ARM11 Rpi model A and 256MB model B can provide tons of computing power and up to 64 GB of RAM. Probably at least 50 GB of that RAM will be available for computational tasks. That's a massive memory bandwidth compared to latest DDR5 https://www.techpowerup.com/304332/r...ry-performance

        Comment


        • #5
          Originally posted by caligula View Post
          Toy clusters? 256+ nodes of ARM11 Rpi model A and 256MB model B can provide tons of computing power and up to 64 GB of RAM. Probably at least 50 GB of that RAM will be available for computational tasks. That's a massive memory bandwidth compared to latest DDR5 https://www.techpowerup.com/304332/r...ry-performance
          I'm having a hard time imagining 256 RPi 1 nodes being much better, if at all better, than a single desktop CPU now. Also that 64GBs a RAM would be in 256 individual pools with only a 10Mb/s ethernet connection for each to communicate with all the other ones. So no, it wouldn't have a lot of memory bandwidth.
          Last edited by Myownfriend; 03 February 2023, 05:01 PM.

          Comment


          • #6
            32-bit computing still has niche use cases.

            32-bit HPC does not.

            Nice step forward for this project.

            Comment


            • #7
              Originally posted by Myownfriend View Post
              with only a 10Mb/s ethernet connection for each to communicate with all the other ones. So no, it wouldn't have a lot of memory bandwidth.
              RPI has 100 Mbit LAN. Also you can plug in RTL 8153 USB dongles to upgrade to 400 Mbit ethernet.

              Comment


              • #8
                Originally posted by caligula View Post
                RPI has 100 Mbit LAN. Also you can plug in RTL 8153 USB dongles to upgrade to 400 Mbit ethernet.
                That's still doesn't make up for having that little memory split up among so many low-compute nodes. Each node would have access to 256MB of memory with only about 300MB/s for reads and 140MB/s for writes as well as an additional 68,280 MBs across 255 nodes at 50MB/s. It can't access that additional memory like slow memory though, because it needs to ask for data from potentially multiple nodes, those nodes need to respond, and the node that's requesting the data needs to have space in own RAM to store the new data. Because of that, the latency to that additional RAM would be measured in milliseconds and bandwidth would very low.

                As you mentioned, each node won't actually have access to all 256MBs of memory to begin and it would need to keep an additional amount free for data swapping purposes. It might also be helpful to set up some swap area on each nodes SD card but that can only be accessed at a theoretical max of 25MB/s which it won't reach in practice.

                Lastly each of those nodes only has a single ARM11 core clock at 1GHz max. These are in-order CPUs that don't have instructions for things like division and are limited to SIMD instructions that are just 32-bit wide.

                Compare that to an AMD 5950X. It has only 16 cores but each core has 2 threads, is out-of-order and has 4x more integer and floating point execution units. Each core runs at 3.5-4.5GHz (3.5-4.5x each Pi node) with significantly higher IPC, division instructions, and 256-bit SIMD instructions. They share 64MBs (two pools of 32MB) of L3 cache with a bandwidth of about 1TB/s and can access external memory (up to 128GBs) at up to around 47GB/s... or about 156-1000x greater than a RPi node can access any part of it's external memory and it's latency would be measured in nanoseconds. L1 and L2 caches are 2.2TB/s and 4.5TB/s respectively.

                Even the swap disk on a 5950x system can be accessed at potentially 30GB/s without spreading it across multiple drives soeven that would be 100x faster than a single pi node's RAM.

                I could do this same comparison with much lower end CPUs and they would still embarrass a 256 node RPi 1 cluster.
                You could maybe make a better case with a 256 node RPi 4 8GB cluster but that would cost over $16,000. At that point you could set up a system with a 96 core Epyc that would out perform it for less money while using less oower and space.
                Last edited by Myownfriend; 05 February 2023, 11:14 AM.

                Comment

                Working...
                X