Originally posted by brucethemoose
View Post
The reason it matters is that cutting-edge deep learning models are huge, with the consequence that you burn a lot more bandwidth reading the model weights than you do reading/writing the data propagating through it. So, the optimization people do is called "batching", where they read & apply a portion of the model to a batch of data samples (e.g. video frames or images), before fetching and applying the next part of the model.
This optimization does work with infinity cache, to some degree, and would deliver a benefit since the Infinity Cache is connected through a higher-bandwidth link and involves less power to access. It's not as good as on-die SRAM, but still better than external DRAM.
Originally posted by brucethemoose
View Post
So, AMD was never truly competitive with Nvidia, in AI. The CDNA chips are a different story, but still getting leap-frogged on AI and only truly competing at HPC.
Leave a comment: