Announcement

Collapse
No announcement yet.

AMD Publishes RDNA 3.5 ISA Documentation

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • LtdJorge
    replied
    Originally posted by coder View Post
    Okay, but the thing is that GPU memory is implemented as DRAM chips soldered onboard. Individual chips are rated in capacity using gigabits (or gibibits; see below). So, it can get a little confusing when someone is sloppy and uses lower-case b, especially when they go to the trouble of pairing it with an upper-case G.


    Yeah, that's a whole other thing and not really my main point. It's not just storage devices, BTW. Also, clock speeds and networking speeds use powers of 10.

    I think the main reason why RAM kept using binary scales is that DRAM chips kept growing by powers of two. That's probably because once you implemented another address bit, it made sense to fill out that entire range of additional memory cells it could address. It's only a recent development that DRAM chips started growing in 1.5x capacities, probably due to the slowdown in semiconductor density scaling, yet demand for ever larger DRAM capacities remains robust.
    Then we have our friend Microsoft using powers of 2 for the calculation and powers of 10 as the unit, and then people see 930GB usable out of a 1TB drive, and start to wonder.

    Leave a comment:


  • lem79
    replied
    Originally posted by coder View Post
    ...
    Points taken. Yes you're right, clocks and network speeds have always been powers of 10 (and often are expressed in bits), and the confusion around RAM when referring to individual chip capacity as opposed to a module. So the answer is to do a proper job, either precise, or properly sloppy and use no case at all, e.g. kb

    Leave a comment:


  • coder
    replied
    Originally posted by lem79 View Post
    Yeah, I know. Back in the day everything related to computer memory and storage was always assumed to be powers of two,
    Okay, but the thing is that GPU memory is implemented as DRAM chips soldered onboard. Individual chips are rated in capacity using gigabits (or gibibits; see below). So, it can get a little confusing when someone is sloppy and uses lower-case b, especially when they go to the trouble of pairing it with an upper-case G.

    Originally posted by lem79 View Post
    ​so the SI prefixes kilo, mega, giga etc always referred to their power of two measurement. Dunno what happened after that, was it hard drive manufacturers who started using the SI prefixes "properly" and suddenly mega, giga etc started meaning 1000 instead of 1024? Then at some point kibi, mebi and gibi etc happened?
    Yeah, that's a whole other thing and not really my main point. It's not just storage devices, BTW. Also, clock speeds and networking speeds use powers of 10.

    I think the main reason why RAM kept using binary scales is that DRAM chips kept growing by powers of two. That's probably because once you implemented another address bit, it made sense to fill out that entire range of additional memory cells it could address. It's only a recent development that DRAM chips started growing in 1.5x capacities, probably due to the slowdown in semiconductor density scaling, yet demand for ever larger DRAM capacities remains robust.
    Last edited by coder; 17 September 2024, 07:27 PM.

    Leave a comment:


  • lem79
    replied
    Originally posted by coder View Post
    No, it has 24 GB (or, more correctly, 24 GiB) of memory.

    Lower case "b" -> bits; use upper case, for bytes.
    Yeah, I know. Back in the day everything related to computer memory and storage was always assumed to be powers of two, so the SI prefixes kilo, mega, giga etc always referred to their power of two measurement. Dunno what happened after that, was it hard drive manufacturers who started using the SI prefixes "properly" and suddenly mega, giga etc started meaning 1000 instead of 1024? Then at some point kibi, mebi and gibi etc happened?

    Leave a comment:


  • coder
    replied
    Originally posted by lem79 View Post
    True, but the 4090 only has 24Gb of that fast memory available.
    No, it has 24 GB (or, more correctly, 24 GiB) of memory.

    Lower case "b" -> bits; use upper case, for bytes.

    Leave a comment:


  • coder
    replied
    Originally posted by pkese View Post
    That's respectable for an APU, but it's nowhere near a GPU.
    Heh, RTX 4090 is hardly a baseline! Nobody in their right might would predict it to have that sort of performance. Even the M3 Ultra, with its 1024-bit LPDDR5X, can't match the RTX 4090!

    Strix Halo looks like it could perform somewhere near a RX 6700 XT, which would be pretty awesome. That's already faster than the Playstation 5's GPU!

    Leave a comment:


  • lem79
    replied
    Originally posted by pkese View Post
    Note that LPDDR5X transfers at around 12 Gbps per pin (different specs say 8.5 to 14 Gbps),
    whereas GDDR6 transfers around 21 Gbps per pin (some say up to 24), so approximately double that of LPDDR5X.

    For example: Nvidia's 4090 has 1,08 TB/s (terabyte per second) memory bandwidth: 384 bits @ 21 Gbps.
    On an APU with 4 channels of LPDDR5X, you can expect 4 * 64 bits = 256 bits @ 12 GBps = 384 GB/s total bandwidth
    ... so approx 1/3 that of a GPU.

    That's respectable for an APU, but it's nowhere near a GPU.
    True, but the 4090 only has 24Gb of that fast memory available. It seems there are use cases where 128-256Gb of VRAM, albeit slower, would be better than 24Gb of fast memory.

    Leave a comment:


  • lem79
    replied
    Originally posted by jaxa View Post
    I've heard this before. Will it have the performance necessary to exploit having large amounts of RAM for LLMs? If it ends up significantly better than other options available to consumers, that could drive up the price.
    It sounds pretty decent. I mean, I can tolerate running LLMs on my 5700U which has a little Vega 8 iGPU in it. It's my impression that it's the volume of memory that's important in this case, i.e. loading 70B models and still being able to run them on the GPU, rather than not.

    Originally posted by jaxa View Post
    But yeah, it should have quad-channel (256-bit) in the top variant (there's talk of a lower 128-bit variant that has more CUs than Strix Point, and some Infinity Cache).128 GB was being tested. I'm not sure if that's the limit, but if it is, maybe we'll see 192-256 GB within 1-2 successor generations.
    ​
    Yeah, I have no doubt there are people out there who could use a mid-range GPU with that much VRAM (which is essentially what Strix Halo is), not just for AI but for other tasks too.

    Originally posted by jaxa View Post
    I thought this feature was for games that would crash due to the system appearing to have low VRAM (512 MB), but more could be used dynamically.​
    Oh, maybe .. I haven't tested it. My vague impression is it's more recently (kernel 6.10?) that it didn't matter about the 512mb VRAM for compute and that the system would allocate as much memory as it needed from the GTT pool (system memory). I'm not sure, but my impression also is that having lots of addressable VRAM has its advantages.

    Leave a comment:


  • pkese
    replied
    Note that LPDDR5X transfers at around 12 Gbps per pin (different specs say 8.5 to 14 Gbps),
    whereas GDDR6 transfers around 21 Gbps per pin (some say up to 24), so approximately double that of LPDDR5X.

    For example: Nvidia's 4090 has 1,08 TB/s (terabyte per second) memory bandwidth: 384 bits @ 21 Gbps.
    On an APU with 4 channels of LPDDR5X, you can expect 4 * 64 bits = 256 bits @ 12 GBps = 384 GB/s total bandwidth
    ... so approx 1/3 that of a GPU.

    That's respectable for an APU, but it's nowhere near a GPU.
    Last edited by pkese; 17 September 2024, 04:54 AM.

    Leave a comment:


  • jaxa
    replied
    Originally posted by lem79 View Post
    Getting pretty excited for Strix Halo at this point.. Is it likely that Strix Halo will become a ROCm monster with that supposed quad channel 128Gb LPDDR5x? I'm guessing very interesting for LLM use cases.
    I've heard this before. Will it have the performance necessary to exploit having large amounts of RAM for LLMs? If it ends up significantly better than other options available to consumers, that could drive up the price.

    But yeah, it should have quad-channel (256-bit) in the top variant (there's talk of a lower 128-bit variant that has more CUs than Strix Point, and some Infinity Cache).128 GB was being tested. I'm not sure if that's the limit, but if it is, maybe we'll see 192-256 GB within 1-2 successor generations.

    Originally posted by lem79 View Post
    On the topic of VRAM and APUs though, FWIW it's possible now to adjust it up to 16Gb with the UniversalAMDFormBrowser, which I successfully did on my Ryzen 5700U. 2x32Gb DDR-3200 SODIMMs (which work fine for a total of 64Gb system RAM), set 16Gb for VRAM and it shows up in LACT and UMR, and when running ollama and whisper (compiled locally with ROCm 6.2), "VRAM" memory use increases as expected.
    I thought this feature was for games that would crash due to the system appearing to have low VRAM (512 MB), but more could be used dynamically.​

    Leave a comment:

Working...
X