Originally posted by sdack
View Post
As for big L3 caches, they're not going anywhere in general purpose CPUs. AMD's most recent Threadripper and Epyc CPUs have as much as 256MB of L3 cache. While the A64FX is technically a general purpose CPU, it's actually meant for the kinds of compute jobs where you process large amounts of sequentially stored data and that can be done very well on GPUs, accelerator cards like Xeon Phi's and various kinds of MAC ASICs like currently in-vogue neural net accelerators. This hardware doesn't use big SRAM caches anyway so general purpose CPUs adapted to perform the same jobs aren't going to have them either.
What's good for hardware that does heavy compute jobs for things like physical simulations, neural net learning and processing, etc. isn't always going going to be any good for hardware that does every day computing. Unlike in the past when things like caches, instruction level parallelism, branch prediction, TLBs, vector instruction units and so on would eventually filter trough to consumer hardware and improve it, that's just not the case when compute hardware is improved by tailoring it for workloads that aren't representative of what consumer hardware spends it's days doing.
I did my master's thesis in scientific compute and wrote my project in CUDA on GPUs so I know how heavily tailored the code and the hardware is to achieve the kinds of results that people demand for these jobs today. A GPU is terrible for the kinds of things of things a general purpose CPU is used for and the same to an increasing extent applies to CPUs adapted to perform the same jobs as GPUs.
Comment