Announcement

**NeoMorpheus** · 11 November 2022, 12:12 AM

Originally posted by coder View Post

The way I'd use an iGPU -- or perhaps other secondary GPUs, were I a game developer -- would be to find other compute tasks to dispatch to it. Perhaps physics, audio, or AI. This could unburden the faster GPU and CPU from handling such tasks. Furthermore, the secondary GPU shouldn't even need to be the same make as the primary.

Everything you said is spot on and I should’ve been clearer since your usage scenario was exactly what i had in mind or perhaps something like intel old quicksync.

**StillStuckOnSI** · 13 November 2022, 12:25 AM

Originally posted by coder View Post

It's a distinction commonly used to describe deep learning ASICs.

For instance: https://www.anandtech.com/show/14187...ators-for-2020

Sure, many companies including Nvidia themselves create and market dedicated ASICs as "for inference" products. That has no connection with whether Nvidia's consumer GPU line is marketed the same way (it isn't) or whether people think it should be.

Originally posted by coder View Post

First off, Nvidia doesn't permit gaming cards to be used in data centers. So, they wouldn't even market the RTX 3090 for deep learning.

And not everyone is working out of a machine in a data centre. Also, I'd be careful with the claim that Nvidia wouldn't market consumer cards for deep learning. See this official page on RTX in gaming/productivity laptops: https://www.nvidia.com/en-us/geforce...s/stem-majors/.

Notice how it says "TensorFlow/Resnet50 Training"? And these are both less powerful and less power hungry GPUs than a top-line desktop card, so by the logic in your next line the latter should be even less of a good fit for inference:

Originally posted by coder View Post

Second, you should be looking at whether it's more cost-effective to use the A40 or the A100 for inference, and then tell me using the A40 for inference is a waste.

If you have access to multiple GPUs in a datacentre, then you can use (and importantly, pay for) as little or as much of their resources as you want for either training or inference. One of the reasons those cards are more expensive/market segmented is because they can be sliced up like this.

If you're a deep learning practitioner/researcher have just a couple of cards available in a local workstation, it makes sense to make the most use of the resources you have available so that the fixed cost of the card is amortized. Because local machines are used primarily for prototyping, this means that the vast majority of your workload will be training. Outside of production, it is diminishingly rare to have local inference requirements so onerous that they require you to get a top-of-the-line consumer card just to keep up.

Originally posted by coder View Post

Because you're probably a student or hobbyist, and that's the best thing you can afford to train on. Moreover, a researcher is primarily focused on model development, not deployment at scale. When a model has been developed for commercial purposes, it needs to be deployed to achieve a return on the investment of developing it. That means putting a lot more data through it than would typically be used to train it. And that means you want hardware that's not overkill for the purpose, since you're probably using many instances and tying them up for long periods of time.

See above. I take it this means you agree that these people developing models are both training models and buying consumer/prosumer over data centre cards?

Originally posted by coder View Post

The word "oriented" is key. Nobody is saying you couldn't use an A100 for inference, just that it's generally overkill for that task.

I agree. What we're arguing here is not that, but whether a 3090-level card makes more sense for inference than it does for training. Just because something is not loudly marketed as for training, doesn't mean it's automatically "inference-oriented". That's like saying anything that isn't hex colour #ff0000 must be blue.

And it's not like we don't have a barometer on how people see these cards in the context of deep learning work either. For example, a ton of people in this space look at the benchmarks Lambda Labs does when new GPUs are released, because getting people access to hardware for deep learning is their MO. Guess what they decide to focus on benchmarking when new 80/90 series cards drop? Not inference, that's what.

For anyone still unsure what the right answer to this discussion is, here's a quick way to get yourself some closure. Find a handful of ML researchers/practitioners/engineers and ask them the following questions:

What do you use for training your deep learning models?
Would you consider the RTX 3090 a training oriented or inference oriented GPU?

My bet is that you'll get some variation on the following:

Local workstations with GTX/Quadro cards for prototyping + clusters/cloud (which use data centre cards).
Blank stares and confused expressions

Or maybe you won't. Either way, you'll have a better idea of what the consensus on this discussion is than this particular forum can provide.

Edit: well, I'm not sure what I expected from a response. For those unfortunate folks who come across this waste of bandwidth discussion in the future, I hope it was at least a nice showcase of how one can carry on an internet argument indefinitely with just unsubstantiated claims, general statements, no concrete evidence and without ever consulting people who might actually understand something about the topic they're writing about. If you or your company find yourself in the market for ML hardware, don't consult Phoronix forums

**coder** · 13 November 2022, 04:28 AM

Originally posted by StillStuckOnSI View Post

And not everyone is working out of a machine in a data centre. Also, I'd be careful with the claim that Nvidia wouldn't market consumer cards for deep learning. See this official page on RTX in gaming/productivity laptops: https://www.nvidia.com/en-us/geforce...s/stem-majors/.

But that's a laptop. There are no such things as datacenter laptops. Businesses don't typically use laptops for training or deployment.

If you look at the URI, they're obviously targeting college kids who are going to buy a gaming GPU no matter what.

Originally posted by StillStuckOnSI View Post

Notice how it says "TensorFlow/Resnet50 Training"? And these are both less powerful and less power hungry GPUs than a top-line desktop card, so by the logic in your next line the latter should be even less of a good fit for inference:

Inference efficiency becomes a concern when you're looking at model deployment. When developing models, you only do enough inference to gather accuracy data, and to measure convergence and check for over-fitting. That's not the kind of volume where efficiency is typically a significant concern, especially relative to the amount of compute being expended on training.

Originally posted by StillStuckOnSI View Post

What we're arguing here is not that, but whether a 3090-level card makes more sense for inference than it does for training.

The issue with using it for training is that it's more limited. Not just in memory capacity, bandwidth, and NVLink-connectivity, but they also artificially halved the performance of tensor ops with fp32-accumulate, specifically to dissuade people from using GeForce cards for training.

https://twitter.com/RyanSmithAT/stat...96479448457216

Originally posted by StillStuckOnSI View Post

Guess what they decide to focus on benchmarking when new 80/90 series cards drop? Not inference, that's what.

Because these are people who can't afford anything better, or for whom spending even more on a GPU wouldn't be justified. If you're even shopping for a consumer GPU to do deep learning classes or research, then you're probably not developing the kinds of models that would require A100-level hardware to train. That's simply not where people start out, and by the time they reach the point of needing one or more A100's, they know it and probably no longer require that kind of hand-holding.

Originally posted by StillStuckOnSI View Post

For anyone still unsure what the right answer to this discussion is, here's a quick way to get yourself some closure. Find a handful of ML researchers/practitioners/engineers and ask them the following questions:

What do you use for training your deep learning models?
Would you consider the RTX 3090 a training oriented or inference oriented GPU?

That proves nothing. That's like asking people in the construction business what truck they use at the job site, and then concluding that dump trucks aren't necessary because most of them simply drive pickup trucks.

When you need a dump truck, it's usually the only viable option. And by the time you reach the point of doing construction jobs that require a dump truck, you typically know enough to figure out when one is needed and what size/type is required.

Announcement

AMD Announces Radeon RX 7900 XTX / RX 7900 XT Graphics Cards - Linux Driver Support Expectations

Comment

Comment

Comment