Originally posted by coder
View Post
On interconnects, there (again) really isn't much of a difference between "training-oriented" and "inference-oriented". Very large models do need fast interconnects, but at that scale you're dealing with more than just NVLink because of cross-node communication. For smaller models that could conceivably fit on one machine, I think most people would consider something like a 4090 more "training" than "inference" despite it not having NVLink at all! Even in prior generations, you could hook up a couple of consumer cards with an NVLink bridge. Won't scale to the large models big companies are developing now, but lets you train something like BERT. What does seem to distinguish GPUs explicitly sold for "inference" like the T4 is that they cut out all of the unnecessary display-related hardware and run at a much lower TDP (e.g. 75W). That's a very different niche than what a flagship compute part like an A100 or a high-end gaming GPU is targeting.
Leave a comment: