Announcement

Collapse
No announcement yet.

Linux Kernel Prepares For Intel Xeon CPUs With On-Package HBM Memory

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by ms178 View Post
    CXL is a huge enabling technology, that makes me wonder why AMD won't bring it to AM5 at launch with PCIe 5 as they will stay at PCIe 4.
    anandtech had an article on it. AMD is apparently concentrating on their Infinity Architecture for the hpc systems. CXL doesn't solve the GPU to GPU all-way direct connections with symmetric coherency. Some discussion there mentioned it requiring 6 links per gpu. They are apparently boosting pcie4 up to 25GHz as part of their solution. Some of the discussion was about thermal issues, but perhaps they just are focusing on that symmetric coherency solution for now.

    The article was





    Comment


    • #12
      Originally posted by jayN View Post

      CXL will enable mapping of GPU's HBM into CPU memory space. CPU would access it through L3 cache. We'll probably get some explanation of this at the Hotchips Sapphire Rapids and Ponte Vecchio presentations in August.
      CXL would be a huge bottleneck for HBM. A 32 or 64GB/s link would be nothing in the context of 2TB/s.

      Comment


      • #13
        Originally posted by Snaipersky View Post

        CXL would be a huge bottleneck for HBM. A 32 or 64GB/s link would be nothing in the context of 2TB/s.
        The HBM would be direct attached to the XPU, for full bandwidth, but, yes, if the CPU must access it, it will be limited by the PCIE bus.
        However, if CXL moves on-chip the bus widths can be large.

        Comment


        • #14
          Originally posted by jayN View Post

          anandtech had an article on it. AMD is apparently concentrating on their Infinity Architecture for the hpc systems. CXL doesn't solve the GPU to GPU all-way direct connections with symmetric coherency. Some discussion there mentioned it requiring 6 links per gpu. They are apparently boosting pcie4 up to 25GHz as part of their solution. Some of the discussion was about thermal issues, but perhaps they just are focusing on that symmetric coherency solution for now.

          The article was

          https://www.anandtech.com/show/15596...-to-everything


          Thanks for the link! Hm, I got the impression that CXL also solves GPU-to-GPU coherency problems, here is a link discussing just that (albeit speculatively as Intel wasn't talking explicitly about it at that time, but the technology should allow it): https://wccftech.com/intel-xe-coherent-multi-gpu-cxl/

          That article also talks about the implications for Intel CPU + Xe GPU over CXL which would give Intel a huge advantage. I'd like to see benchmarks of such a system in action.
          Last edited by ms178; 12 June 2021, 05:02 PM.

          Comment


          • #15
            Originally posted by ms178 View Post

            Thanks for the link! Hm, I got the impression that CXL also solves GPU-to-GPU coherency problems, here is a link discussing just that (albeit speculatively as Intel wasn't talking explicitly about it at that time, but the technology should allow it): https://wccftech.com/intel-xe-coherent-multi-gpu-cxl/

            That article also talks about the implications for Intel CPU + Xe GPU over CXL which would give Intel a huge advantage. I'd like to see benchmarks of such a system in action.
            CXL uses a host (usually CPU) to maintain the coherency for some number of XPU/slave devices. It doesn't define any peer to peer protocol for XPUs, as is done with CCIX. However, an XPU can access another XPU's memory with CXL if it goes through the Host CPU L3.

            The best presentation I've seen for cxl 1.1 is here:

            The CXL interface adds both a memory and a caching protocol between a host CPU and a device. The Memory Protocol enables a device to expose memory region to ...


            The presentation shows caches on the host associated with CXL and connected to L3 and Host's Home Agent.

            Comment


            • #16
              Originally posted by kylew77 View Post
              Seems in line with what I read on this site: https://www.servethehome.com/server-...b-onboard-era/ The era of GBs of cache is coming supposedly.
              Been there, done that. Knights Landing Xeon Phi had 16 GB of MCDRAM in-package, which could be directly-addressed or used as L3 cache. That launched in 2016. You could drop it in the same socket as Skylake SP Xeons and supposedly boot a mainline Linux kernel on it.

              Comment


              • #17
                Originally posted by Snaipersky View Post
                I wonder who will make an iGPU-equipped CPU with HBM first... The colossal bandwidth afforded should mean that every thread in an 8c16t could receive 64GB/s and still have 1TB/s available to a GPU (assuming 4x2048bit@2GHz).
                stick the whole package on a motherboard, and you could have a full workstation in less space than a 6-pack.
                I've been thinking about this for many years. Maybe since around when AMD launched Fury.

                I think it probably makes sense for some small, high-end laptops, but that's about it. For anything bigger, you're just better off with a dGPU. The added cost and capacity constraints of HBM just aren't worth it.

                The real kicker comes when you pair it with Optane DIMMs. You could configure those as swap and then 16 GB really wouldn't feel restricting. Plus, you'd get instant hibernate/wakeup.

                Comment


                • #18
                  Originally posted by jayN View Post
                  Intel's Kaby Lake-G had their cpu, AMD's GPU and some HBM in one package.
                  Yeah but no. It's a wholly different animal. That was basically just a dGPU + its HBM mounted on the same substrate as the CPU die, but they were otherwise separate. The CPU still had its own normal DDR4.

                  The only way they were more closely-linked than if the CPU and GPU had been separate packages was the load-balancing of power utilization.

                  Comment


                  • #19
                    Originally posted by jayN View Post
                    CXL will enable mapping of GPU's HBM into CPU memory space. CPU would access it through L3 cache.
                    It's going to be slower and more energy-intensive than accessing direct-attached RAM.

                    I think CXL-based memory modules will be used for special use-cases, such as when you specifically want a memory pool shared between accelerators, or maybe in a storage hierarchy above flash. I don't foresee it replacing direct-attached DRAM.

                    Comment


                    • #20
                      Originally posted by ms178 View Post
                      CXL is a huge enabling technology, that makes me wonder why AMD won't bring it to AM5
                      AM5 is a mainstream/consumer-oriented socket. CXL is a server-oriented technology.

                      Originally posted by ms178 View Post
                      PCIe 5 as they will stay at PCIe 4.
                      PCIe 5 adds significant cost to motherboards and devices. Consumers have no need for it.

                      Comment

                      Working...
                      X