Announcement

Collapse
No announcement yet.

Radeon "GFX90A" Added To LLVM As Next-Gen CDNA With Full-Rate FP64

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Spacefish View Post
    LLVM does support RDNA as a ISA, but the code generated by LLVM will be magnitudes slower on RDNA than a hand optimized kernel on CDNA.
    So, what do graphics shader compilers use? I thought the two backends in use were LLVM and ACO?

    Originally posted by Spacefish View Post
    WIth CUDA you can use a cheap (ok currently not) consumer card from 5 years ago and it just works and accelerates your workload pretty significantly.
    Yes and no. To my point about AMD making Matrix Cores and BFloat16 hardware accessible to the unwashed masses, the analogous Nvidia features would be their tensor cores, which they only added to consumer products a couple years ago. But Nvidia is better about this point. They enable nearly all of their GPUs with some amount of compute capabilities, so students can buy one GPU for both gaming and course work.

    Originally posted by Spacefish View Post
    FPGA Accelerators: They might come to consumer hardware sooner or later, maybe embeeded in a CPU or a GPU to accelerate AI workloads in games or image processing.
    I haven't seen benchmarks where they can beat Nvidia's tensor cores. So, it would have to be for something GPUs do poorly, like spiking neural networks. Those still aren't very common, and are likely to be addressed by purpose-built AI engines, by the time they are.

    Originally posted by Spacefish View Post
    I guess we will see them as a replacement for HPC accelerators first, liek what Xilinx is doing with their Alveo Accelerators.
    I'm not familiar with those, but where I see FPGAs making sense in general-purpose computing is in cases where you need the flexibility of a programmable solution with the low-latency of hard-wired logic. So, maybe for things like software-defined networking or high-frequency trading.

    Originally posted by Spacefish View Post
    Toolstack / Software Support will be key here..
    But we might see some C -> RTL compilers then, so you can write against a a library like ROCM or CUDA and have your code accelerated by an FPGA..
    They've had OpenCL support for like a decade. You can buy an Intel FPGA card and use it with oneAPI for AI acceleration, today.

    Comment


    • #32
      Originally posted by coder View Post
      If they just wanted to build a fast mining chip, then there'd be no need for fp64 or their Matrix Cores. Leaving that stuff out could make it substantially cheaper. In fact, I wonder if you even need any floating-point for mining.
      Well, you can buy a MI100 today, but it's priced to compete with Nvidia A100, which is to say a lot:
      https://www.dell.com/en-us/work/shop...ic-video-cards
      Yeah, it's a very big chip (750 mm2?), meaning that the potential for large price drops is probably limited to a small number of partially defective dies they might be able to salvage as low-spec products for academia and hobbyists.
      thx for the link. 12425 dollars sounds high. but what do you expect ? the price is so high because the performance is the best.
      in computer history in the last 40 years you always pay the double double price if some hardware performs the best.
      if you think that defective dies is the only way they can make a cheaper price then you are wrong.
      they do not need to make 750mm² chip yes they can also only make a 400mm² chip
      but you are maybe not happy with the 400mm² chip also because if the 12425 dollar price drop to 6000 dollars you are still not happy.

      but i am honest to you to build a small chip and or drop "fp64 or their Matrix Cores" does not make sense to me.

      this kind of hardware is built to be the fastest. why do you think you can make "academia and hobbyists" happy with hardware what is not the fastest ?

      they could do a developer/academia supporter programm to hand some developers and academia people the hardware out for free. would be great PR.
      Phantom circuit Sequence Reducer Dyslexia

      Comment


      • #33
        Originally posted by coder View Post
        You can't put GDDR memory on a DIMM. The timing and electrical specs are way too tight for that, which is how they've managed to squeeze out the extra performance from it. I doubt you can even use it from a socketed CPU.
        but IBM does exactly this: https://en.wikipedia.org/wiki/Cohere...ssor_Interface
        Phantom circuit Sequence Reducer Dyslexia

        Comment


        • #34
          Originally posted by Qaridarium View Post
          if you think that defective dies is the only way they can make a cheaper price then you are wrong.
          they do not need to make 750mm² chip yes they can also only make a 400mm² chip
          but you are maybe not happy with the 400mm² chip also because if the 12425 dollar price drop to 6000 dollars you are still not happy.
          It's just one possibility.

          The Navi 21 die (RX 6800 & 6900) is reportedly about 520 mm2, and those cards have list prices ranging from $580 to $1000. Part of it is that cost increases disproportionately with area, but they also have to account for non-recurring engineering costs, market conditions, etc.

          Originally posted by Qaridarium View Post
          why do you think you can make "academia and hobbyists" happy with hardware what is not the fastest ?
          It still needs to out-perform other options on the market, but you could easily cut it down by more than half and still do that. Take another look at the stats on their matrix cores.

          The main point is that if the general public has no GPUs or other accelerators with CDNA features, then AMD shouldn't be surprised if virtually the only opensource software that uses them is what AMD writes, itself. If there's a lesson AMD could take away from Nvidia's runaway success in AI, it's to seed the Universities and general public with hardware that can be used to power the next wave of software innovations.

          Originally posted by Qaridarium View Post
          they could do a developer/academia supporter programm to hand some developers and academia people the hardware out for free. would be great PR.
          Yes, that's along the lines of what I'm saying.

          Comment


          • #35
            Originally posted by Qaridarium View Post
            My point was specifically about putting GDDR on DIMMs. Where does it say they do that?

            Comment


            • #36
              Originally posted by coder View Post
              My point was specifically about putting GDDR on DIMMs. Where does it say they do that?
              the point with openCAPI is that you can put anything you want on a DIMM even GDDR ram.
              Phantom circuit Sequence Reducer Dyslexia

              Comment


              • #37
                Originally posted by coder View Post
                It's just one possibility.
                The Navi 21 die (RX 6800 & 6900) is reportedly about 520 mm2, and those cards have list prices ranging from $580 to $1000. Part of it is that cost increases disproportionately with area, but they also have to account for non-recurring engineering costs, market conditions, etc.
                i am sure if they build a 400mm² CDNA card it will easily outperform a 520mm² 6900....
                because the 6900 has all the 3D stuff you just don't need in compute.

                Originally posted by coder View Post
                It still needs to out-perform other options on the market, but you could easily cut it down by more than half and still do that. Take another look at the stats on their matrix cores.
                The main point is that if the general public has no GPUs or other accelerators with CDNA features, then AMD shouldn't be surprised if virtually the only opensource software that uses them is what AMD writes, itself. If there's a lesson AMD could take away from Nvidia's runaway success in AI, it's to seed the Universities and general public with hardware that can be used to power the next wave of software innovations.
                Yes, that's along the lines of what I'm saying.
                right now for AMD it makes no sense to build cheaper CDNA compute card and the reason for this is simple:
                they work at 100% capacity of the 5/7/12nm fabs.
                believe it ot not but this is simple not possible.

                if a 7nm fab runs at 50% capacity you can say ok lets make a cheap card.
                but if you run at 100% you would be very crazy person to make a cheap card.

                the only way right now to build a cheap CDNA card would be to backport it to 12nm or 16nm

                a card like this would still be fast because of the very good architectur but in the end you would not be happy because it is not 5nm.
                Phantom circuit Sequence Reducer Dyslexia

                Comment


                • #38
                  Originally posted by Qaridarium View Post
                  the point with openCAPI is that you can put anything you want on a DIMM even GDDR ram.
                  Okay, here's what you're talking about:

                  https://www.anandtech.com/show/14706...mory-interface

                  It appears to add a lot of cost, and yet normal DDR4 or DDR5 can already saturate the link. So, it's not a real way to cheat the constraints of GDDR memories, nor does it really add anything over DDR, in this case. However, it does add latency, which is already higher in GDDR memories than their regular DDR cousins.

                  Basically, OpenCAPI doesn't make sense for a desktop PC or most workstations. It's a server-oriented technology.
                  Last edited by coder; 24 February 2021, 12:42 AM.

                  Comment


                  • #39
                    Originally posted by coder View Post
                    Okay, here's what you're talking about:
                    https://www.anandtech.com/show/14706...mory-interface
                    It appears to add a lot of cost, and yet normal DDR4 or DDR5 can already saturate the link. So, it's not a real way to cheat the constraints of GDDR memories, nor does it really add anything over DDR, in this case. However, it does add latency, which is already higher in GDDR memories than their regular DDR cousins.
                    Basically, OpenCAPI doesn't make sense for a desktop PC or most workstations. It's a server-oriented technology.
                    do you really think it is designed to "to add a lot of cost" ??? no its not.
                    it is designed for flexibility for example if you have a task who does not need fast ram but a lot of ram you can build openCAPI SSD
                    then it is much cheaper per 1GB of ram...
                    it is also designed to reduce Obsolescence means if you have a server you have openCAPI DDR4 ram and later you want to upgrade it to DDR5 or GDDR6x thats no problem.
                    and in this meaning it can reduce costs because for an old system you do not need to buy new system for DDR5 you just upgrade your old DDR4 system with DDR5... and imagine this: in future there is super cheap DDR6 ram no problem you upgrade to DDR6...

                    so its not designed to add a lot of cost it is designed to reduce the costs in the long run.
                    Phantom circuit Sequence Reducer Dyslexia

                    Comment


                    • #40
                      Originally posted by Qaridarium View Post
                      do you really think it is designed to "to add a lot of cost" ??? no its not.
                      Not for servers, which already use registered memory and stand to benefit most from a more scalable memory architecture. But you really can't say it's not more expensive to put a memory controller on every DIMM, rather than just have one inside the CPU. There's a reason why they moved it from the motherboard (where it was previously part of what was once called a "Northbridge" chip) and into the CPU!

                      Originally posted by Qaridarium View Post
                      if you have a server you have openCAPI DDR4 ram and later you want to upgrade it to DDR5 or GDDR6x thats no problem.
                      If your existing RAM is already close to maxing out the link speed, then it's also no benefit! Indeed, from what I see, the point is really about scaling capacity, rather than scaling speed. The way they get speed benefits is simply by decoupling the RAM from the CPU, so they can scale up the number of channels.

                      Comment

                      Working...
                      X