Announcement

Collapse
No announcement yet.

Linux Parallel CPU Bring-Up Shows Great Impact Bringing Up 1,920 Sapphire Rapids Cores

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linux Parallel CPU Bring-Up Shows Great Impact Bringing Up 1,920 Sapphire Rapids Cores

    Phoronix: Linux Parallel CPU Bring-Up Shows Great Impact Bringing Up 1,920 Sapphire Rapids Cores

    After the prior kernel patches had stalled in their review process, last week work was revived on x86_64 parallel CPU bring-up for the Linux kernel to help in booting the kernel faster on larger core count desktops and servers. The results have been promising and over the past few days more test results have flowed in along with other positive commentary that will hopefully this time lead to the work ultimately getting upstreamed...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Wow. I wonder what the people with those ~100K core supercomputers are using? Have they been using a version of this patch all along? The power useage just to boot the things must be staggering. I'm glad my largest config has but 12 cores and 24 threads. Never thought I'd see it that way.

    I assume that these patches don't slow down the more nominal 4-16 core case where 99% of users live?

    Comment


    • #3
      I'm curious what that 16-socket motherboard looks like, because if it can't fit in a 2U server rack then what's honestly the point? I'm sure it'd be more cost effective to just buy more boards with fewer sockets.

      Comment


      • #4
        Michael the comment says
        This patchset reduces the full boot time by 57 seconds, a 25% reduction.
        It's reduced by 57 seconds, not to 57 seconds! It's impressive nonetheless!

        Comment


        • #5
          Originally posted by willmore View Post
          Wow. I wonder what the people with those ~100K core supercomputers are using? Have they been using a version of this patch all along? The power useage just to boot the things must be staggering. I'm glad my largest config has but 12 cores and 24 threads. Never thought I'd see it that way.
          Those types of supercomputers aren’t a single computer, it’s thousands (or more) individual servers able to communicate via high speed, low latency networking. When they boot up it’s not much different than 1000s of single server booting up.

          Comment


          • #6
            Originally posted by willmore View Post
            Wow. I wonder what the people with those ~100K core supercomputers are using? Have they been using a version of this patch all along? The power useage just to boot the things must be staggering. I'm glad my largest config has but 12 cores and 24 threads. Never thought I'd see it that way.

            I assume that these patches don't slow down the more nominal 4-16 core case where 99% of users live?
            As mentioned, those 100k core supercomputers aren't single image systems (as in, you have a single kernel managing all of it). They are essentially clusters, albeit ones with high bandwidth low latency network fabrics. Often these fabrics support RDMA, but that doesn't make them single image systems.

            The biggest single image systems I'm aware hail from the line of supercomputers Silicon Graphics did. Originally the MIPS based Origin line running Irix, maxing out at something like 2k CPU's. These were followed by the Itanium based Altix series running Linux, maxing out at 4k cores IIRC. Then came various x86-64 based Altix systems, still running Linux. Finally SGI was bought out by HPE, and HPE is still developing the Altix series. Considering the message this article is about came from HPE, it seems likely the test platform was one of these Altix systems.

            Comment


            • #7
              Originally posted by schmidtbag View Post
              I'm curious what that 16-socket motherboard looks like, because if it can't fit in a 2U server rack then what's honestly the point? I'm sure it'd be more cost effective to just buy more boards with fewer sockets.
              There are applications that really prefer to be in one large single system image. This is basically fitting the same place as the big RISC/UNIX systems, and specifically it's the successor to the Itanium-based HP Superdome family. There's a few apps that work well in this kind of system, but the big marketable one is SAP HANA.

              Also, it's not one mainboard; it's several, I believe of four sockets each, talking over a low-latency interconnect that effectively provides a switched extension of the internal UPI for SMP. My knowledge of the Xeon-based Superdomes is a lot weaker than it is of the Itanium predecessors, though.
              Last edited by Dawn; 06 February 2023, 11:08 AM.

              Comment


              • #8
                Originally posted by jabl View Post

                As mentioned, those 100k core supercomputers aren't single image systems (as in, you have a single kernel managing all of it). They are essentially clusters, albeit ones with high bandwidth low latency network fabrics. Often these fabrics support RDMA, but that doesn't make them single image systems.

                The biggest single image systems I'm aware hail from the line of supercomputers Silicon Graphics did. Originally the MIPS based Origin line running Irix, maxing out at something like 2k CPU's. These were followed by the Itanium based Altix series running Linux, maxing out at 4k cores IIRC. Then came various x86-64 based Altix systems, still running Linux. Finally SGI was bought out by HPE, and HPE is still developing the Altix series. Considering the message this article is about came from HPE, it seems likely the test platform was one of these Altix systems.
                Close, but not quite.

                The Altix UV line is actually pretty much dead at this point. The big HPE boxes now are actually replacements for Superdome/Superdome2/SuperdomeX for commercial computing (not technical computing like the Altix), but with SGI interconnect IP integrated in a vaguely Frankenstein-esque way. That's what the article refers to. The term to look up is "Superdome Flex" (not Superdome Flex 280, which is a mostly-unrelated glueless 8-socket system AFAIK.)
                Last edited by Dawn; 06 February 2023, 11:02 AM.

                Comment


                • #9
                  Originally posted by schmidtbag View Post
                  I'm curious what that 16-socket motherboard looks like, because if it can't fit in a 2U server rack then what's honestly the point? I'm sure it'd be more cost effective to just buy more boards with fewer sockets.
                  Some workloads are both running more efficiently from within a single computer than from interconnected computers, and consuming an amount of memory in the dozens of TBs, nowadays. However, most 2S or 4S designs (and the latter might be getting harder and harder to find in 2U factor because of the ever increasing processor TDP) cannot contain enough memory, and/or the memory access speed takes a dive beyond 1 DIMM Per Channel. In that case, 8+S designs offer more memory bandwidth with less memory per core.

                  EDIT: Dawn beat me

                  Comment


                  • #10
                    Originally posted by schmidtbag View Post
                    I'm curious what that 16-socket motherboard looks like, because if it can't fit in a 2U server rack then what's honestly the point? I'm sure it'd be more cost effective to just buy more boards with fewer sockets.
                    If this is anything like the SGI Origin/Altix line of shared memory supercomputers, which is likely considering HPE bought SGI, it's built of 2S boards that slot into a backplane carrying cache coherent memory traffic. The chassis these boards plug in to (a bit like a blade server chassis for general purpose computing) then contains a switch for the memory fabric, and a boatload of fiber cables out the back for connecting a bunch of these chassis together into a folded CLOS fabric for the memory traffic.

                    As for cost effectiveness, sure a cluster with Infiniband is going to beat this one hands down. Organizations that buy these systems buy them because they have applications that haven't been ported to a distributed memory architecture.

                    Comment

                    Working...
                    X