Announcement

Collapse
No announcement yet.

Radeon "GFX90A" Added To LLVM As Next-Gen CDNA With Full-Rate FP64

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Radeon "GFX90A" Added To LLVM As Next-Gen CDNA With Full-Rate FP64

    Phoronix: Radeon "GFX90A" Added To LLVM As Next-Gen CDNA With Full-Rate FP64

    It looks like the open-source driver support to the next-generation CDNA GPU / MI100 "Arcturus" successor is on the way. Hitting mainline AMDGPU LLVM is a new "GFX90A" target adding new interesting features for compute...

    http://www.phoronix.com/scan.php?pag...PU-LLVM-GFX90A

  • #2
    MFMA will be probably pretty usefull for upcoming "3D" neural networks and NN in general i guess?!

    Comment


    • #3
      I wonder how such an updated GFX9 card with 64 CUs with 8GB HBM2e would perform on 7nm TSMC process. Such a Vega 64 v3 could still be a decent gaming and prosumer card.

      Comment


      • #4
        Originally posted by ms178 View Post
        I wonder how such an updated GFX9 card with 64 CUs with 8GB HBM2e would perform on 7nm TSMC process. Such a Vega 64 v3 could still be a decent gaming and prosumer card.
        I guess sadly the days are over, where the gaming cards and compute cards share the same architecture..

        Would be cool if they could release a ~300-500$ 7nm CDNA card for the "casual" academic users or people at home doing some ML / Mining / Compute intensive tasks.

        Otherwise we are stuck with RDNA / RDNA2 cards with second class citizen ROCm support..
        At least they are now commited to bringt RDNA support to ROCm, as i can tell from the github issues tracker.

        RDNA won´t have all the cool compute features (as they are not required for gaming). This means the CDNA hand written assembly kernel will never run on them and probably there is not such much interest on AMDs side to spend a lot of time to optimize kernels for RDNA..

        I wish they would just use some of the otherwise defective CDNA chips, disable 3/4th of the cores and release them as some entry level CDNA card.
        A small 7nm CDNA chip (with like 20 CUs for example) is probably too expensive to develop for the small market segment.
        Maybe even allow all working CUs to work and clock them down or nerf the cards in some other way, so enterprise customers don´t want to buy them.. Like restricting using multiple of these Cards in one system or the like..

        Comment


        • #5
          Originally posted by Spacefish View Post

          I guess sadly the days are over, where the gaming cards and compute cards share the same architecture..

          Would be cool if they could release a ~300-500$ 7nm CDNA card for the "casual" academic users or people at home doing some ML / Mining / Compute intensive tasks.

          Otherwise we are stuck with RDNA / RDNA2 cards with second class citizen ROCm support..
          At least they are now commited to bringt RDNA support to ROCm, as i can tell from the github issues tracker.
          That's exactly along the lines of my thinking, a card you can work with during the daytime and play with at night. And if the compute side becomes more important for general purpose tasks again (the writing is on the wall with CXL incoming), such divergence in architecture could hurt them because it would be more effort to optimize for two architectures instead of one, but maybe we will see them converging again at that point in time.

          Comment


          • #6
            Originally posted by phoronix View Post
            Phoronix: Radeon "GFX90A" Added To LLVM As Next-Gen CDNA With Full-Rate FP64

            It looks like the open-source driver support to the next-generation CDNA GPU / MI100 "Arcturus" successor is on the way. Hitting mainline AMDGPU LLVM is a new "GFX90A" target adding new interesting features for compute...

            http://www.phoronix.com/scan.php?pag...PU-LLVM-GFX90A
            Computation MI100 GPU (Peak TFLOPS)
            Matrix FP16 184.6
            Matrix bf16 92.3
            Matrix FP32 46.1
            Vector FP32 23.1
            Vector FP64 11.5
            https://www.techpowerup.com/gpu-spec...chitecture.pdf

            Comment


            • #7
              Originally posted by ms178 View Post

              That's exactly along the lines of my thinking, a card you can work with during the daytime and play with at night. And if the compute side becomes more important for general purpose tasks again (the writing is on the wall with CXL incoming), such divergence in architecture could hurt them because it would be more effort to optimize for two architectures instead of one, but maybe we will see them converging again at that point in time.
              There is absolutely no point in that, when RDNA can run Vulkan and OpenCL compute, and HIP on Linux.

              The sole purpose of CDNA is to go after HPC density and TCO. And unlike the article implies CNDA != GCN any more than RDNA is GCN.

              Anything you can run on CDNA is going to also run on RNDA... just not quite as fast as long as it is portable code. Nobody should be writing assembly for GPUs at this point... unless you really need an HPC application to scale, in which case you already have access to that.

              Comment


              • #8
                Originally posted by Spacefish View Post

                I guess sadly the days are over, where the gaming cards and compute cards share the same architecture..
                It really makes no sense to share archetectures when you can optimize a design for a specific use case.
                Would be cool if they could release a ~300-500$ 7nm CDNA card for the "casual" academic users or people at home doing some ML / Mining / Compute intensive tasks.
                Who knows it might happen. However the market for these cards is very hungry for performance so that is what AMD has to go after first.
                Otherwise we are stuck with RDNA / RDNA2 cards with second class citizen ROCm support..
                At least they are now commited to bringt RDNA support to ROCm, as i can tell from the github issues tracker.
                Just remember AMD is hiring talent right now and has been since they cleaned up their financial mess. It takes awhile to do things right. Being an RDNA card owner I can say that I'm a bit disappointed that ROCm seems to never come. On the other hand I understand what they are trying to do with the resources they have.
                RDNA won´t have all the cool compute features (as they are not required for gaming). This means the CDNA hand written assembly kernel will never run on them and probably there is not such much interest on AMDs side to spend a lot of time to optimize kernels for RDNA..

                I wish they would just use some of the otherwise defective CDNA chips, disable 3/4th of the cores and release them as some entry level CDNA card.
                A small 7nm CDNA chip (with like 20 CUs for example) is probably too expensive to develop for the small market segment.
                In order to have enough defective chips to do a cut down CDNA implementation they will need to achieve a certain volume with the mainstream chip. I'm not yet convinced that CDNA is moving in a volume high enough to produce a marketable quantity of lower performance chips.
                Maybe even allow all working CUs to work and clock them down or nerf the cards in some other way, so enterprise customers don´t want to buy them.. Like restricting using multiple of these Cards in one system or the like..
                The common practice with computational chips like this is to partition off defective areas of the "big" chip to use in lower end cards. The problem here is volume, they need enough silicon to make such cards viable as a product. If the rumors of high yields at TSMC are true that makes it even harder to come up with enough defective chips.

                I understand what you want and it night be in AMD's plans long term. It will only happen though when the economic conditions are right.

                I'm still dreaming of a Thread Ripper dual socket implementation where the second socket is dedicated to a CDNA chip. Even here I would think that AMD would likely want to push significant power through the chip to keep customers interested. I'm talking 150 to 200 watts which would be a lot on a motherboard with NUMA access to memory.

                Comment


                • #9
                  Originally posted by cb88 View Post
                  ...

                  Anything you can run on CDNA is going to also run on RNDA... just not quite as fast as long as it is portable code. ...
                  That statement is true today but likely will not remain so into the future. Being a GPU/Acceleration processor, AMD can add easily just about any sort of specialized instruction and hardware to accelerate advanced math computation. We are basically on the first release of CDNA and it hasn't moved far from its GPU roots. I can see the day when a CDNA card is so far removed from the GPU world that you wouldn't even consider running your code on a mainstream GPU.

                  Comment


                  • #10
                    Originally posted by wizard69 View Post
                    I'm still dreaming of a Thread Ripper dual socket implementation where the second socket is dedicated to a CDNA chip. Even here I would think that AMD would likely want to push significant power through the chip to keep customers interested. I'm talking 150 to 200 watts which would be a lot on a motherboard with NUMA access to memory.
                    believe it or not but they work on this right now... but even better than you think.

                    Zen4 in 5nm + RDNA3 + Xilinx FPGA + HBM3 +Infinity cache +SSD all conected with xGMI

                    but be sure "dual socket" this will not have a socket at all. soldered directly to the board

                    also 200 watt?... wrong such a mainboard will be ~600watt and all water cooled.

                    but no water will be used instead 3M Novec LIQUID is used.
                    Phantom circuit Sequence Reducer Dyslexia

                    Comment

                    Working...
                    X