No announcement yet.

AMD & Intel Team Up For UALink As Open Alternative To NVIDIA's NVLink

  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    For anyone interested the sister site to The Register known as The Next Platform which concerns itself with news in the HPC, Supercomputing and AI space has a deep dive on the news of UAlink. It’s worth a read.

    The generative AI revolution is making strange bedfellows, as revolutions and emerging monopolies that capitalize on them, often do. The Ultra Ethernet


    • #12
      Originally posted by Jumbotron View Post
      For anyone interested the sister site to The Register known as The Next Platform which concerns itself with news in the HPC, Supercomputing and AI space has a deep dive on the news of UAlink. It’s worth a read.
      Nice - thanks for that. From the article:
      No one is expecting to link GPUs from multiple vendors inside one chassis or maybe even one rack or one pod of multiple racks. But what the UALink consortium members do believe is that system makers will create machines that use UALink and allow accelerators from many players to be put into these machines as customers build out their pods. You could have one pod with AMD GPUs, one pod with Intel GPUs, and another pod with some custom accelerators from any number of other players. It allows commonality of server designs at the interconnect level, just like the Open Accelerator Module (OAM) spec put out by Meta Platforms and Microsoft allows commonality of accelerator sockets on system boards.

      Wherefore Art Thou CXL?
      We know what you are thinking: Were we not already promised this same kind of functionality with the Compute Express Link (CXL) protocol running atop of PCI-Express fabrics? Doesn’t the CXLmem subset already offer the sharing of memory between CPUs and GPUs? Yes, it does. But PCI-Express and CXL are much broader transports and protocols. Katti says that the memory domain for pods of AI accelerators is much larger than the memory domains for CPU clusters, which as we know scale from 2 to 4 to sometimes 8 to very rarely 16 compute engines. GPU pods for AI accelerators scale to hundreds of compute engines, and need to scale to thousands, many believe. And unlike CPU NUMA clustering, GPU clusters in general and those running AI workloads in particular are more forgiving when it comes to memory latency, Katti tells The Next Platform.


      • #13
        Originally posted by Michael View Post

        No, it's unlikely to yield any cross-vendor / heterogeneous accelerator interoperability. It was covered in yesterday's briefing.
        Well, yes and no. Having, say, a single AI pod with a mix of AMD GPUs and Intel GPUs or for that matter Nvidia all lashed together with UAlink? No.

        However, lashing together Pod 1 with a 1,000 AMD GPUs and Pod 2 with 800 Intel GPUs….then yes.

        It’s a little like what HP’s Gen-Z had in mind for disparate racks of CPUs before everyone gave their separate heterogeneous interconnect protocols over to Intel’s CXL consortium so there could be a unified industry standard and not all this silly fiefdom wars which only benefits Nvidia. You can look at the scenario above as bringing NUMA to GPUs, where as CXL brings NUMA to everything else.


        • #14
          Originally posted by jeisom View Post

          Other than connectors, I am not sure where the benefit is then. Each will probably need their own drivers regardless. Maybe some small help for software developers when they already understand the protocol. I am not to familiar with nvlink though, so maybe I am missing something.
          Emulation of nvlink at the provider level is shaky and HGX boards are in short supply and insanely expensive. They also dont let partners pull a ton of information or change a ton on the HGX pex. AMD is alot friendlier with there boards and fabric.


          • #15
            What a waste! They could have named it AILink! As in A(MD)I(ntel)Link.


            • #16
              Originally posted by qarium View Post
              for so many times and years amd and intel not cooperating always resulted in Nvidia win...

              intel does OpenCL based OneAPI and AMD does ROCm/HIP result: Nvidia Wins...
              intel could write a ROCm backend... and AMD could support OneAPI...

              in many other areas intel and amd also do not cooperate and always Nvidia wins...

              another example intel does XeSS and amd does FSR and result is Nvidia wins.

              in the NPU space same intel has different NPU than AMD and result is Nvidia wins..

              Intel's ISA war in the X86_64 space is the same shit intel does not support AMD's AVX512 instead they do AVX10/AVX10.1 with different 256bit and 512bit pathways. but right now it looks like the ISA war between intel and AMD result in ARM wins big.

              so what exactly is the point of ISA war between intel and amd if in the end all people switch to ARM SVE2 ???

              and right now it looks like ARM ISA 9.2 with SVE2 beats intel and amd hard. Apple M4 use this ARM ISA 9.2 with SVE2...

              so whatever amd and intel does as soon as they do not cooperate Nvidia and ARM wins against them
              You are right. But please, write it a lot better next time. Thanks in advance


              • #17
                Originally posted by Drago View Post
                What a waste! They could have named it AILink! As in A(MD)I(ntel)Link.
                That’s actually pretty clever and seeing as how this interlink by AMD though superior would not have the same support if Intel had not endorsed it makes what you said actually more true particularly since having Infinity Fabric used this way really does benefit AI.


                • #18
                  Originally posted by jeisom View Post
                  Also, nvlink came out in 2018. So the rest of the industry is roughly 9 years behind??
                  I don't think NVLink is a cross-vendor industry standard, is it ?

                  If not then a better comparison would be Infinity Fabric (2017 CPU, 2018 GPU) which in turn is based on Coherent HyperTransport (2001). Infinity Fabric is what we use in the MI300, for example, both between dies on the package (GMI) and between packages (XGMI).

                  I don't remember the timing for Intel's' inter-chip protocols (QPI/UPI for CPU and XeLink for GPU) but I think they were similar.
                  Last edited by bridgman; 03 June 2024, 08:09 AM.
                  Test signature


                  • #19
                    Originally posted by Jumbotron View Post
                    AMD needs to rally around Intel’s CXL and OneAPI, leverage the industry support of Infinity Fabric for GPU communication and CEASE development of ROCm and pour those freed up resources and money into further making hardware that runs Intel’s physical and software frameworks better and cheaper than Intel itself. THAT’S….how you compete against Nvidia. Nvidia is the Apple Computer of the Compute Industrial Complex.
                    While I agree with this in theory, the devil is in the details. The first question is how would/does OneAPI exploit differences in hardware architecture? All the Nvidia stuff is tightly integrated with the Nvidia drivers, and you will see specific releases of CUDA and cuDNN coupled to specific driver versions. While not definitive, that's kind of a bad sign for a unified software development environment. Second, AMD has a lot invested in ROCm and likely a large number of customers using it. What do they do about these customers if they planned to switch to OneAPI? Third, I don't have enough experience to comment on the relative superiority of CUDA vs. ROCm vs. OneAPI - do we know that OneAPI is more than just the selling point of a single development environment for all HPC platforms? I have around a hundred CUDA users, maybe 2-3 ROCm users, and once someone asked me to install OneAPI. Standardization requires adoption. If AMD notices that no one is using OneAPI they have nearly zero motive to change anything.


                    • #20
                      Originally posted by bridgman View Post
                      If not then a better comparison would be Infinity Fabric (2017 CPU, 2018 GPU)
                      UALink is based on Infinity Fabric, which AMD open sourced for this purpose.