AMD Announces Radeon RX 7900 XTX / RX 7900 XT Graphics Cards - Linux Driver Support Expectations

StillStuckOnSI replied

07 November 2022, 11:46 PM
Originally posted by coder View Post

But they might indeed be biased towards int8, or lower.

You definitely need at least fp16 (bf16 is much better), for training. The other thing training tends to like is large memory. For the larger models, you also want multiple GPUs and fast interconnects. Those are the things which distinguish training-oriented vs. inference-oriented GPUs.

It's quite the coincidence, but a search for "RDNA3 bf16" turns up a leak from today which seems to indicate they finally have support for the format: https://videocardz.com/newz/alleged-...gram-leaks-out. For the uninitiated, part of the reason Google's TPUs are so competitive for ML workloads is that their preferred input format is bf16. That doesn't mean any hardware which supports bf16 will automatically be faster, and it's not clear what rate it will run at on the new AMD cards (compared to fp32), but at least it'll make code more portable now that Intel/Nvidia/AMD/Google all support it in consumer-accessible hardware.

On interconnects, there (again) really isn't much of a difference between "training-oriented" and "inference-oriented". Very large models do need fast interconnects, but at that scale you're dealing with more than just NVLink because of cross-node communication. For smaller models that could conceivably fit on one machine, I think most people would consider something like a 4090 more "training" than "inference" despite it not having NVLink at all! Even in prior generations, you could hook up a couple of consumer cards with an NVLink bridge. Won't scale to the large models big companies are developing now, but lets you train something like BERT. What does seem to distinguish GPUs explicitly sold for "inference" like the T4 is that they cut out all of the unnecessary display-related hardware and run at a much lower TDP (e.g. 75W). That's a very different niche than what a flagship compute part like an A100 or a high-end gaming GPU is targeting.
Leave a comment:
verude replied

07 November 2022, 08:03 PM
Originally posted by WannaBeOCer View Post

DSC definitely works on Linux with a Nvidia GPU. If it didn’t I wouldn’t be able to run my 27GN950 at 4K/160Hz on Linux

https://github.com/NVIDIA/open-gpu-k...omment-2781922

Ah seems like higher refresh rates aren’t supported yet. Hopefully they’ll add support soon.

https://forums.developer.nvidia.com/...ivers/198363/5 according to this thread, linux doesn't support DSC, maybe it's down to your chroma subsampling?
Leave a comment:
coder replied

07 November 2022, 06:57 PM
Originally posted by StillStuckOnSI View Post

Unless these accelerators are operating at extremely low precision (think 1, 2, 4 or 8 bits), there is no substantive difference between "inference" and "training". e.g. to make use of tensor cores in many CUDA operations, your inputs need to be fp16 or (the slightly wider but still lower precision than fp32) tf32 anyhow.

But they might indeed be biased towards int8, or lower.

You definitely need at least fp16 (bf16 is much better), for training. The other thing training tends to like is large memory. For the larger models, you also want multiple GPUs and fast interconnects. Those are the things which distinguish training-oriented vs. inference-oriented GPUs.
Leave a comment:
StillStuckOnSI replied

07 November 2022, 06:25 PM
Originally posted by WannaBeOCer View Post

Again it sounds like the AI accelerators in RDNA3 are aimed at inference not training neural networks.

Unless these accelerators are operating at extremely low precision (think 1, 2, 4 or 8 bits), there is no substantive difference between "inference" and "training". e.g. to make use of tensor cores in many CUDA operations, your inputs need to be fp16 or (the slightly wider but still lower precision than fp32) tf32 anyhow.
Leave a comment:
WannaBeOCer replied

07 November 2022, 06:20 PM
Originally posted by verude View Post

I don't think the nvidia drives support dsc over dp on linux

DSC definitely works on Linux with a Nvidia GPU. If it didn’t I wouldn’t be able to run my 27GN950 at 4K/160Hz on Linux

https://github.com/NVIDIA/open-gpu-k...omment-2781922

Ah seems like higher refresh rates aren’t supported yet. Hopefully they’ll add support soon.

Last edited by WannaBeOCer; 07 November 2022, 06:24 PM.
Leave a comment:
coder replied

07 November 2022, 02:04 PM
Originally posted by drakonas777 View Post

I think we going to see these relatively small APUs for some time now. They are sort of optimal for this market: you get a good encoding/decoding engine and some decent light gaming performance from them, and for anything else serious you pair them with big dGPUs.

Intel seems determined to push bigger GPU options into the laptop segment, apparently trying to chip away at the laptop dGPU market. Meteor lake will move the GPU onto its own tile, with one seeming objective being to enable them to pair larger GPU tiles with different CPU core configurations.
Leave a comment:
drakonas777 replied

07 November 2022, 01:57 PM
Originally posted by brucethemoose View Post

Plenty of users benefit from GPU performance these days... even if they dont know it.

The power savings and PCB space savings would be huge in laptops (See: the M1 Pro/Ultra), but I think OEMs just weren't interested in it until the M1 came around.

After all, they rejected the Intel's eDRAM Broadwell chips, they largely rejected the AMD/Intel hybrid package, and they reportedly rejected Van Gogh (the Steam Deck chip).

I'm not saying users do not benefit from GPU. I'm saying the concept of a big APU for specifically Windows PC market is questionable at least. For the professional PC GPGPU workloads CUDA is an industry standard, so even Intel or AMD could provide a M1 Pro/Ultra like x86 SoC this would not automatically translate to a success story in this use case. NVIDIA, on the other hand, realistically could only make an ARM based APU which would have it's own problems because of non-x86 ISA.

I agree that big APU makes sense for gaming laptops, but yet again we face a PC market reality: NVIDIA has a major market share (and mind share) in PC gaming, so a big APU from Intel/AMD for gaming might not be that successful overall, though I think some specific products, like Asus ROG Zephyrus G14 or other 'AMD advantage' series products could be successful as a separate line, but the volume is what drives chip R&D.

Since PC market is not tightly controller by any one single HW company, I think we going to see these relatively small APUs for some time now. They are sort of optimal for this market: you get a good encoding/decoding engine and some decent light gaming performance from them, and for anything else serious you pair them with big dGPUs. Also they going to get a lot better, 12CU RDNA3 iGPU in 2023 15-28W Phoenix APU for example, is not that small.
Leave a comment:
verude replied

07 November 2022, 01:57 PM
Originally posted by WannaBeOCer View Post

What's the benefit of DP 2.1 on a gaming monitor when Display Stream Compression is lossless? That supports 4K 240hz, 8K 120Hz and 10K 100hz.

A RTX 4090 averaged ~150 FPS at 4K and used about the same power as a RX 6950 XT when gaming which has a lower TBP than a RX 7900 XTX: https://tpucdn.com/review/nvidia-gef...wer-gaming.png

Seems like it's priced accordingly due to it's lower performance than a RTX 4090

I don't think the nvidia drives support dsc over dp on linux
Leave a comment:
coder replied

07 November 2022, 01:53 PM
Originally posted by brucethemoose View Post

The power savings and PCB space savings would be huge in laptops (See: the M1 Pro/Ultra), but I think OEMs just weren't interested in it until the M1 came around.

Intel was trying, with its Iris models that featured 2x or 3x the normal amount of EUs and up to 128 MB of eDRAM.

Originally posted by brucethemoose View Post

After all, they rejected the Intel's eDRAM Broadwell chips,

Because, even then, it wasn't terribly good. There were bottlenecks in the architecture that kept the GPU from scaling well. So, performance was good, but probably not enough to justify the added price or steer power users away from a dGPU option.

Originally posted by brucethemoose View Post

they largely rejected the AMD/Intel hybrid package,

But that was just weird. And the value-add compared with having a truly separate dGPU was tenuous, at best.

Originally posted by brucethemoose View Post

and they reportedly rejected Van Gogh (the Steam Deck chip).

According to whom? Didn't Valve contract with AMD specifically to make it for them? In those sorts of arrangements, Valve would retain ownership of the IP. At least, that's supposedly how it is with MS and Sony.
Leave a comment:
coder replied

07 November 2022, 01:47 PM
Originally posted by drakonas777 View Post

We might see a big APU again in some special or niche product (like Valve desktop console based on SteamOS, "SteamBox" or whatever hypothetically speaking), but that's it. A big APU really does not have a good use-case in Windows/PC ecosystem. Desktops would not have any benefit at all from such a APU: on contrary, they would be limited in the terms of upgradeability and modularity.

Lack of upgradability didn't stop Apple. The M1 Max and Ultra are exactly big APUs.

I wonder if one of the big PC OEMs: Dell, Lenovo, HP, etc. would ever contract with AMD (or Intel) to make a big APU with in-package LPDDR5.
Leave a comment:

Announcement

AMD Announces Radeon RX 7900 XTX / RX 7900 XT Graphics Cards - Linux Driver Support Expectations

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: