Originally posted by L_A_G
View Post
DDR as the name says is about frequency and data rates. With DDR does the information travel in serial. Hence do you see high frequencies and the clock timings getting more stretched out. This introduces latency, which can only be countered by having i.e. 64-bit channels and then using multiple channels to increase the bandwidth. Even with a moderate channel width such as 64-bit do DDR4 modules need 288 pins. In case of the 8 channels for AMD EPYC does this require 2308 pins and yields about 145 GB/sec.
A single HBM stack with a 128-bit interface and operating at as little as 500 MHz moves 128 GB/sec, because there is no more latency associated with the memory. It uses a lower frequency, but it also uses a double data rate and so transfers 1 Gbit/sec (one bit on each flank) over a 128-bit wide channel. So the only latency you have is that between the CPU frequency and memory frequency.
AMD has used 4 stacks of HBM2 for one of their graphics card and moves 1 TB/sec, just as Fujitsu does with their A64FX also using 4 stacks (and 160,000 of these CPUs are inside the Fugaku super computer). Cray has started a partnership with Fujitsu also making use of it. Nvidia uses HBM2 for their Tesla P100 cards. Intel is said to release their new GPU "Xe" this year (aka Ponte Vecchio) and it will use HBM memory.
Comment