AMD Announces Radeon RX 7900 XTX / RX 7900 XT Graphics Cards - Linux Driver Support Expectations

coder replied

13 November 2022, 04:28 AM
Originally posted by StillStuckOnSI View Post

And not everyone is working out of a machine in a data centre. Also, I'd be careful with the claim that Nvidia wouldn't market consumer cards for deep learning. See this official page on RTX in gaming/productivity laptops: https://www.nvidia.com/en-us/geforce...s/stem-majors/.

But that's a laptop. There are no such things as datacenter laptops. Businesses don't typically use laptops for training or deployment.

If you look at the URI, they're obviously targeting college kids who are going to buy a gaming GPU no matter what.

Originally posted by StillStuckOnSI View Post

Notice how it says "TensorFlow/Resnet50 Training"? And these are both less powerful and less power hungry GPUs than a top-line desktop card, so by the logic in your next line the latter should be even less of a good fit for inference:

Inference efficiency becomes a concern when you're looking at model deployment. When developing models, you only do enough inference to gather accuracy data, and to measure convergence and check for over-fitting. That's not the kind of volume where efficiency is typically a significant concern, especially relative to the amount of compute being expended on training.

Originally posted by StillStuckOnSI View Post

What we're arguing here is not that, but whether a 3090-level card makes more sense for inference than it does for training.

The issue with using it for training is that it's more limited. Not just in memory capacity, bandwidth, and NVLink-connectivity, but they also artificially halved the performance of tensor ops with fp32-accumulate, specifically to dissuade people from using GeForce cards for training.

https://twitter.com/RyanSmithAT/stat...96479448457216

Originally posted by StillStuckOnSI View Post

Guess what they decide to focus on benchmarking when new 80/90 series cards drop? Not inference, that's what.

Because these are people who can't afford anything better, or for whom spending even more on a GPU wouldn't be justified. If you're even shopping for a consumer GPU to do deep learning classes or research, then you're probably not developing the kinds of models that would require A100-level hardware to train. That's simply not where people start out, and by the time they reach the point of needing one or more A100's, they know it and probably no longer require that kind of hand-holding.

Originally posted by StillStuckOnSI View Post

For anyone still unsure what the right answer to this discussion is, here's a quick way to get yourself some closure. Find a handful of ML researchers/practitioners/engineers and ask them the following questions:
What do you use for training your deep learning models?

Would you consider the RTX 3090 a training oriented or inference oriented GPU?

That proves nothing. That's like asking people in the construction business what truck they use at the job site, and then concluding that dump trucks aren't necessary because most of them simply drive pickup trucks.

When you need a dump truck, it's usually the only viable option. And by the time you reach the point of doing construction jobs that require a dump truck, you typically know enough to figure out when one is needed and what size/type is required.
Leave a comment:
StillStuckOnSI replied

13 November 2022, 12:25 AM
Originally posted by coder View Post

It's a distinction commonly used to describe deep learning ASICs.

For instance: https://www.anandtech.com/show/14187...ators-for-2020

Sure, many companies including Nvidia themselves create and market dedicated ASICs as "for inference" products. That has no connection with whether Nvidia's consumer GPU line is marketed the same way (it isn't) or whether people think it should be.

Originally posted by coder View Post

First off, Nvidia doesn't permit gaming cards to be used in data centers. So, they wouldn't even market the RTX 3090 for deep learning.

And not everyone is working out of a machine in a data centre. Also, I'd be careful with the claim that Nvidia wouldn't market consumer cards for deep learning. See this official page on RTX in gaming/productivity laptops: https://www.nvidia.com/en-us/geforce...s/stem-majors/.

Notice how it says "TensorFlow/Resnet50 Training"? And these are both less powerful and less power hungry GPUs than a top-line desktop card, so by the logic in your next line the latter should be even less of a good fit for inference:

Originally posted by coder View Post

Second, you should be looking at whether it's more cost-effective to use the A40 or the A100 for inference, and then tell me using the A40 for inference is a waste.

If you have access to multiple GPUs in a datacentre, then you can use (and importantly, pay for) as little or as much of their resources as you want for either training or inference. One of the reasons those cards are more expensive/market segmented is because they can be sliced up like this.

If you're a deep learning practitioner/researcher have just a couple of cards available in a local workstation, it makes sense to make the most use of the resources you have available so that the fixed cost of the card is amortized. Because local machines are used primarily for prototyping, this means that the vast majority of your workload will be training. Outside of production, it is diminishingly rare to have local inference requirements so onerous that they require you to get a top-of-the-line consumer card just to keep up.

Originally posted by coder View Post

Because you're probably a student or hobbyist, and that's the best thing you can afford to train on. Moreover, a researcher is primarily focused on model development, not deployment at scale. When a model has been developed for commercial purposes, it needs to be deployed to achieve a return on the investment of developing it. That means putting a lot more data through it than would typically be used to train it. And that means you want hardware that's not overkill for the purpose, since you're probably using many instances and tying them up for long periods of time.

See above. I take it this means you agree that these people developing models are both training models and buying consumer/prosumer over data centre cards?

Originally posted by coder View Post

The word "oriented" is key. Nobody is saying you couldn't use an A100 for inference, just that it's generally overkill for that task.

I agree. What we're arguing here is not that, but whether a 3090-level card makes more sense for inference than it does for training. Just because something is not loudly marketed as for training, doesn't mean it's automatically "inference-oriented". That's like saying anything that isn't hex colour #ff0000 must be blue.

And it's not like we don't have a barometer on how people see these cards in the context of deep learning work either. For example, a ton of people in this space look at the benchmarks Lambda Labs does when new GPUs are released, because getting people access to hardware for deep learning is their MO. Guess what they decide to focus on benchmarking when new 80/90 series cards drop? Not inference, that's what.

For anyone still unsure what the right answer to this discussion is, here's a quick way to get yourself some closure. Find a handful of ML researchers/practitioners/engineers and ask them the following questions:
What do you use for training your deep learning models?

Would you consider the RTX 3090 a training oriented or inference oriented GPU?

My bet is that you'll get some variation on the following:
Local workstations with GTX/Quadro cards for prototyping + clusters/cloud (which use data centre cards).

Blank stares and confused expressions

Or maybe you won't. Either way, you'll have a better idea of what the consensus on this discussion is than this particular forum can provide.

Edit: well, I'm not sure what I expected from a response. For those unfortunate folks who come across this waste of bandwidth discussion in the future, I hope it was at least a nice showcase of how one can carry on an internet argument indefinitely with just unsubstantiated claims, general statements, no concrete evidence and without ever consulting people who might actually understand something about the topic they're writing about. If you or your company find yourself in the market for ML hardware, don't consult Phoronix forums
Last edited by StillStuckOnSI; 15 November 2022, 10:00 PM.
Leave a comment:
NeoMorpheus replied

11 November 2022, 12:12 AM
Originally posted by coder View Post

The way I'd use an iGPU -- or perhaps other secondary GPUs, were I a game developer -- would be to find other compute tasks to dispatch to it. Perhaps physics, audio, or AI. This could unburden the faster GPU and CPU from handling such tasks. Furthermore, the secondary GPU shouldn't even need to be the same make as the primary.

Everything you said is spot on and I should’ve been clearer since your usage scenario was exactly what i had in mind or perhaps something like intel old quicksync.
Likes 1
Leave a comment:
coder replied

10 November 2022, 11:19 PM
Originally posted by NeoMorpheus View Post

Talking about APUs, does anyone knows if they (well, the gpu part) work in tandem if a dGPU (AMD of course) is also present?

I have no idea of AMD has announced anything like that. Intel was talking about it, a couple years ago -- I think particularly with their DG1, which was very much like the larger Xe iGPUs in their notebook CPUs.

My own opinion on this is that it doesn't make much sense, unless you're pairing a low-powered dGPU with an APU having a relatively high-powered iGPU (and this is exactly the situation with a "G7" Tigerlake U + DG1, which are nearly twins). If they're too asymmetric in performance, then the iGPU isn't contributing enough to be worth the trouble and added overhead.

The way I'd use an iGPU -- or perhaps other secondary GPUs, were I a game developer -- would be to find other compute tasks to dispatch to it. Perhaps physics, audio, or AI. This could unburden the faster GPU and CPU from handling such tasks. Furthermore, the secondary GPU shouldn't even need to be the same make as the primary.

Last edited by coder; 11 November 2022, 12:31 AM.
Likes 1
Leave a comment:
NeoMorpheus replied

10 November 2022, 04:23 PM
Talking about APUs, does anyone knows if they (well, the gpu part) work in tandem if a dGPU (AMD of course) is also present?
Leave a comment:
coder replied

10 November 2022, 01:32 AM
Originally posted by StillStuckOnSI View Post

The question is whether people (especially ML practitioners) use "training oriented" and "inference oriented" to describe particular models or product lines of GPUs. Outside of the Tesla T4/Jetson lineage, I have not seen anything vaguely resembling this terminology being thrown around and I've certainly not seen the exact wording.

It's a distinction commonly used to describe deep learning ASICs.

For instance: https://www.anandtech.com/show/14187...ators-for-2020

It’s not being called an AI training accelerator, it’s not being called a GPU, etc. It’s only being pitched for AI inference – efficiently executing pre-trained neural networks.

However, that is not an isolated example.

Originally posted by StillStuckOnSI View Post

I've definitely not seen it being used to distinguish between something like a 3090 vs an A100. Using the former just for inference would be a waste,

First off, Nvidia doesn't permit gaming cards to be used in data centers. So, they wouldn't even market the RTX 3090 for deep learning.

Second, you should be looking at whether it's more cost-effective to use the A40 or the A100 for inference, and then tell me using the A40 for inference is a waste.

Originally posted by StillStuckOnSI View Post

and when somebody buys one for ML they usually plan on training on it.

Because you're probably a student or hobbyist, and that's the best thing you can afford to train on. Moreover, a researcher is primarily focused on model development, not deployment at scale. When a model has been developed for commercial purposes, it needs to be deployed to achieve a return on the investment of developing it. That means putting a lot more data through it than would typically be used to train it. And that means you want hardware that's not overkill for the purpose, since you're probably using many instances and tying them up for long periods of time.

Originally posted by StillStuckOnSI View Post

So separating them into "training" and "inference" categories is a false dichotomy.

The word "oriented" is key. Nobody is saying you couldn't use an A100 for inference, just that it's generally overkill for that task.
Leave a comment:
StillStuckOnSI replied

08 November 2022, 10:14 PM
The question is whether people (especially ML practitioners) use "training oriented" and "inference oriented" to describe particular models or product lines of GPUs. Outside of the Tesla T4/Jetson lineage, I have not seen anything vaguely resembling this terminology being thrown around and I've certainly not seen the exact wording. Moreover, I've definitely not seen it being used to distinguish between something like a 3090 vs an A100. Using the former just for inference would be a waste, and when somebody buys one for ML they usually plan on training on it. On the other hand, for stuff like large language models you often need the latter for inference, because they won't fit on one V100/A100/H100. So separating them into "training" and "inference" categories is a false dichotomy.
Leave a comment:
coder replied

08 November 2022, 03:44 AM
Originally posted by StillStuckOnSI View Post

It's quite the coincidence, but a search for "RDNA3 bf16" turns up a leak from today which seems to indicate they finally have support for the format: https://videocardz.com/newz/alleged-...gram-leaks-out. For the uninitiated, part of the reason Google's TPUs are so competitive for ML workloads is that their preferred input format is bf16. That doesn't mean any hardware which supports bf16 will automatically be faster,

It has better numerical stability and converts trivially to/from fp32. Furthermore, the silicon footprint for implementing fp multipliers scales roughly as a square of the mantissa. That gives bf16 an advantage not only in density, but also energy-efficiency.

The downside is that it's not much good for a whole lot else, due to having so little precision.

Originally posted by StillStuckOnSI View Post

and it's not clear what rate it will run at on the new AMD cards (compared to fp32), but at least it'll make code more portable now that Intel/Nvidia/AMD/Google all support it in consumer-accessible hardware.

Well, my guess is their AI units don't even support fp32, in which case it's probably a moot point.

Originally posted by StillStuckOnSI View Post

On interconnects, there (again) really isn't much of a difference between "training-oriented" and "inference-oriented".

There is, because training typically involves a lot more data. Even the models are bigger, because they haven't yet been optimized.

Originally posted by StillStuckOnSI View Post

What does seem to distinguish GPUs explicitly sold for "inference" like the T4 is that they cut out all of the unnecessary display-related hardware and run at a much lower TDP (e.g. 75W). That's a very different niche than what a flagship compute part like an A100 or a high-end gaming GPU is targeting.

The training GPUs don't have display hardware, either. And the lower clock speed has not so much to do with training vs. inference, and everything to do with things like power-efficiency, durability, and density -- all things you want in server-oriented GPUs.
Leave a comment:
brucethemoose replied

08 November 2022, 02:31 AM
Originally posted by coder View Post

Intel was trying, with its Iris models that featured 2x or 3x the normal amount of EUs and up to 128 MB of eDRAM.

Because, even then, it wasn't terribly good. There were bottlenecks in the architecture that kept the GPU from scaling well. So, performance was good, but probably not enough to justify the added price or steer power users away from a dGPU option.

But that was just weird. And the value-add compared with having a truly separate dGPU was tenuous, at best.

According to whom? Didn't Valve contract with AMD specifically to make it for them? In those sorts of arrangements, Valve would retain ownership of the IP. At least, that's supposedly how it is with MS and Sony.

No, Van Gogh was on leaked AMD roadmaps before Valve would have presumably ordered it for the Deck: https://videocardz.com/newz/amd-mobi...g-with-vangogh

The more uncorroborated rumor is that OEMs other than Valve simply rejected it.

Last edited by brucethemoose; 08 November 2022, 02:35 AM.
Leave a comment:
WannaBeOCer replied

07 November 2022, 11:48 PM
Originally posted by verude View Post

https://forums.developer.nvidia.com/...ivers/198363/5 according to this thread, linux doesn't support DSC, maybe it's down to your chroma subsampling?

Definitely not chroma subsampling, I think I'd trust a nvidia driver developer than a forum moderator.
Leave a comment:

Announcement

AMD Announces Radeon RX 7900 XTX / RX 7900 XT Graphics Cards - Linux Driver Support Expectations

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: