Announcement

Collapse
No announcement yet.

AMD Publishes RDNA 3.5 ISA Documentation

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Publishes RDNA 3.5 ISA Documentation

    Phoronix: AMD Publishes RDNA 3.5 ISA Documentation

    AMD today made public their RDNA 3.5 instruction set architecture (ISA) programming guide for these updated RDNA3 graphics found within new Ryzen AI 300 "Strix Point" APUs thus far...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Getting pretty excited for Strix Halo at this point.. Is it likely that Strix Halo will become a ROCm monster with that supposed quad channel 128Gb LPDDR5x? I'm guessing very interesting for LLM use cases.

    Some interesting pieces of the puzzle coming out lately.. Strix Halo being added to ROCm (article on VideoCardz recently), an AMD rep talking about Strix Halo as a 'mini supercomputer', the RDNA 3.5 ISA coming out, AMD saying they'll allow adjusting the "VRAM" size on APUs. 512mb current default up to what, 80% of the available system memory? As pieces of the puzzle, that suggests to me AMD want to make Strix Halo a beast.

    On the topic of VRAM and APUs though, FWIW it's possible now to adjust it up to 16Gb with the UniversalAMDFormBrowser, which I successfully did on my Ryzen 5700U. 2x32Gb DDR-3200 SODIMMs (which work fine for a total of 64Gb system RAM), set 16Gb for VRAM and it shows up in LACT and UMR, and when running ollama and whisper (compiled locally with ROCm 6.2), "VRAM" memory use increases as expected.

    Comment


    • #3
      Originally posted by lem79 View Post
      Getting pretty excited for Strix Halo at this point.. Is it likely that Strix Halo will become a ROCm monster with that supposed quad channel 128Gb LPDDR5x? I'm guessing very interesting for LLM use cases.
      I've heard this before. Will it have the performance necessary to exploit having large amounts of RAM for LLMs? If it ends up significantly better than other options available to consumers, that could drive up the price.

      But yeah, it should have quad-channel (256-bit) in the top variant (there's talk of a lower 128-bit variant that has more CUs than Strix Point, and some Infinity Cache).128 GB was being tested. I'm not sure if that's the limit, but if it is, maybe we'll see 192-256 GB within 1-2 successor generations.

      Originally posted by lem79 View Post
      On the topic of VRAM and APUs though, FWIW it's possible now to adjust it up to 16Gb with the UniversalAMDFormBrowser, which I successfully did on my Ryzen 5700U. 2x32Gb DDR-3200 SODIMMs (which work fine for a total of 64Gb system RAM), set 16Gb for VRAM and it shows up in LACT and UMR, and when running ollama and whisper (compiled locally with ROCm 6.2), "VRAM" memory use increases as expected.
      I thought this feature was for games that would crash due to the system appearing to have low VRAM (512 MB), but more could be used dynamically.​

      Comment


      • #4
        Note that LPDDR5X transfers at around 12 Gbps per pin (different specs say 8.5 to 14 Gbps),
        whereas GDDR6 transfers around 21 Gbps per pin (some say up to 24), so approximately double that of LPDDR5X.

        For example: Nvidia's 4090 has 1,08 TB/s (terabyte per second) memory bandwidth: 384 bits @ 21 Gbps.
        On an APU with 4 channels of LPDDR5X, you can expect 4 * 64 bits = 256 bits @ 12 GBps = 384 GB/s total bandwidth
        ... so approx 1/3 that of a GPU.

        That's respectable for an APU, but it's nowhere near a GPU.
        Last edited by pkese; 17 September 2024, 04:54 AM.

        Comment


        • #5
          Originally posted by jaxa View Post
          I've heard this before. Will it have the performance necessary to exploit having large amounts of RAM for LLMs? If it ends up significantly better than other options available to consumers, that could drive up the price.
          It sounds pretty decent. I mean, I can tolerate running LLMs on my 5700U which has a little Vega 8 iGPU in it. It's my impression that it's the volume of memory that's important in this case, i.e. loading 70B models and still being able to run them on the GPU, rather than not.

          Originally posted by jaxa View Post
          But yeah, it should have quad-channel (256-bit) in the top variant (there's talk of a lower 128-bit variant that has more CUs than Strix Point, and some Infinity Cache).128 GB was being tested. I'm not sure if that's the limit, but if it is, maybe we'll see 192-256 GB within 1-2 successor generations.
          Yeah, I have no doubt there are people out there who could use a mid-range GPU with that much VRAM (which is essentially what Strix Halo is), not just for AI but for other tasks too.

          Originally posted by jaxa View Post
          I thought this feature was for games that would crash due to the system appearing to have low VRAM (512 MB), but more could be used dynamically.​
          Oh, maybe .. I haven't tested it. My vague impression is it's more recently (kernel 6.10?) that it didn't matter about the 512mb VRAM for compute and that the system would allocate as much memory as it needed from the GTT pool (system memory). I'm not sure, but my impression also is that having lots of addressable VRAM has its advantages.

          Comment


          • #6
            Originally posted by pkese View Post
            Note that LPDDR5X transfers at around 12 Gbps per pin (different specs say 8.5 to 14 Gbps),
            whereas GDDR6 transfers around 21 Gbps per pin (some say up to 24), so approximately double that of LPDDR5X.

            For example: Nvidia's 4090 has 1,08 TB/s (terabyte per second) memory bandwidth: 384 bits @ 21 Gbps.
            On an APU with 4 channels of LPDDR5X, you can expect 4 * 64 bits = 256 bits @ 12 GBps = 384 GB/s total bandwidth
            ... so approx 1/3 that of a GPU.

            That's respectable for an APU, but it's nowhere near a GPU.
            True, but the 4090 only has 24Gb of that fast memory available. It seems there are use cases where 128-256Gb of VRAM, albeit slower, would be better than 24Gb of fast memory.

            Comment


            • #7
              Originally posted by pkese View Post
              That's respectable for an APU, but it's nowhere near a GPU.
              Heh, RTX 4090 is hardly a baseline! Nobody in their right might would predict it to have that sort of performance. Even the M3 Ultra, with its 1024-bit LPDDR5X, can't match the RTX 4090!

              Strix Halo looks like it could perform somewhere near a RX 6700 XT, which would be pretty awesome. That's already faster than the Playstation 5's GPU!

              Comment


              • #8
                Originally posted by lem79 View Post
                True, but the 4090 only has 24Gb of that fast memory available.
                No, it has 24 GB (or, more correctly, 24 GiB) of memory.

                Lower case "b" -> bits; use upper case, for bytes.

                Comment


                • #9
                  Originally posted by coder View Post
                  No, it has 24 GB (or, more correctly, 24 GiB) of memory.

                  Lower case "b" -> bits; use upper case, for bytes.
                  Yeah, I know. Back in the day everything related to computer memory and storage was always assumed to be powers of two, so the SI prefixes kilo, mega, giga etc always referred to their power of two measurement. Dunno what happened after that, was it hard drive manufacturers who started using the SI prefixes "properly" and suddenly mega, giga etc started meaning 1000 instead of 1024? Then at some point kibi, mebi and gibi etc happened?

                  Comment


                  • #10
                    Originally posted by lem79 View Post
                    Yeah, I know. Back in the day everything related to computer memory and storage was always assumed to be powers of two,
                    Okay, but the thing is that GPU memory is implemented as DRAM chips soldered onboard. Individual chips are rated in capacity using gigabits (or gibibits; see below). So, it can get a little confusing when someone is sloppy and uses lower-case b, especially when they go to the trouble of pairing it with an upper-case G.

                    Originally posted by lem79 View Post
                    ​so the SI prefixes kilo, mega, giga etc always referred to their power of two measurement. Dunno what happened after that, was it hard drive manufacturers who started using the SI prefixes "properly" and suddenly mega, giga etc started meaning 1000 instead of 1024? Then at some point kibi, mebi and gibi etc happened?
                    Yeah, that's a whole other thing and not really my main point. It's not just storage devices, BTW. Also, clock speeds and networking speeds use powers of 10.

                    I think the main reason why RAM kept using binary scales is that DRAM chips kept growing by powers of two. That's probably because once you implemented another address bit, it made sense to fill out that entire range of additional memory cells it could address. It's only a recent development that DRAM chips started growing in 1.5x capacities, probably due to the slowdown in semiconductor density scaling, yet demand for ever larger DRAM capacities remains robust.
                    Last edited by coder; 17 September 2024, 07:27 PM.

                    Comment

                    Working...
                    X