Announcement

**sdack** · 15 July 2020, 03:09 PM

Originally posted by L_A_G View Post

That would be interesting to see, but considering the cost I'm not convinced it's actually worth it. Only CPU use that I've seen so far was that MCM from Intel with an Intel CPU die, a Polaris-based GPU die from AMD and a 4GB HBM2 die for that GPU to make up the low bandwidth of DDR4 memory.

General purpose CPUs, working out of considerably faster SRAM-based cache memory for the vast majority of the time, are much more responsive to memory latency rather than bandwidth and HBM (High Bandwidth Memory), as the name implies, is all about bandwidth.

Don't get me wrong thou, we could start seeing a lot of CPUs with HBM pretty soon. Intel has been investing heavily into their own GPUs and are bringing in a big set of new machine learning focused vector instructions and functionality into their new CPUs in the very near future. Both of these applications are highly bandwidth-dependent and would benefit from HBM to a very high degree. However it could also be that DDR5 ends up making HBM, with all of the extra silicon, costs and additional constraints required to implement it, simply redundant.

This isn't a competition between technologies. It's about throwing an old one out. Samsung, Hynix, AMD, Nvidia, Intel, JEDEC, etc. they're all in on this.

DDR as the name says is about frequency and data rates. With DDR does the information travel in serial. Hence do you see high frequencies and the clock timings getting more stretched out. This introduces latency, which can only be countered by having i.e. 64-bit channels and then using multiple channels to increase the bandwidth. Even with a moderate channel width such as 64-bit do DDR4 modules need 288 pins. In case of the 8 channels for AMD EPYC does this require 2308 pins and yields about 145 GB/sec.

A single HBM stack with a 128-bit interface and operating at as little as 500 MHz moves 128 GB/sec, because there is no more latency associated with the memory. It uses a lower frequency, but it also uses a double data rate and so transfers 1 Gbit/sec (one bit on each flank) over a 128-bit wide channel. So the only latency you have is that between the CPU frequency and memory frequency.

AMD has used 4 stacks of HBM2 for one of their graphics card and moves 1 TB/sec, just as Fujitsu does with their A64FX also using 4 stacks (and 160,000 of these CPUs are inside the Fugaku super computer). Cray has started a partnership with Fujitsu also making use of it. Nvidia uses HBM2 for their Tesla P100 cards. Intel is said to release their new GPU "Xe" this year (aka Ponte Vecchio) and it will use HBM memory.

**L_A_G** · 15 July 2020, 03:54 PM

Originally posted by sdack View Post

...

Sure, there are always going to be superior technologies, but they often come with their own constraints and are often not very applicable in the most common use cases. General purpose CPU cores spend the vast majority of their time working out of registers and cache implemented in SRAM, which completely blows DRAM like HBM out of the water both in terms of latency and bandwidth. However taking up much more die area and thus being considerably more expensive to make, a single bit without logic using 4 transistors while a bit of DRAM without logic using a single transistor and a capacitor, they're not used for everything.

As I said, there are use cases where HBM is absolutely worth it the same way there are use cases where SRAM is absolutely worth it. However most of those use cases are in compute tasks that are more and more being performed by hardware other than general purpose CPUs. Even general purpose CPUs made for those compute tasks are more and more just general purpose CPUs with more specialized compute hardware to actually perform the compute workload bolted onto them. These high bandwidth memories attached to CPUS are however not very useful outside of these specialized compute tasks. HBM was specifically created for being able to read large amounts of consecutively stored information from memory very rapidly and the memory access patterns for general purpose CPUs performing general purpose tasks couldn't be more different, doing small reads back and forth all over system memory.

If cost was no issue system memory would use SRAM instead of DRAM. However as it definitely is, use of SRAM is generally limited to registers and on-die cache memories. HBM runs into a very similar issue when competing with more traditional DRAM memories as system memory. It's got better bandwidth and even latency, but it comes with significant downsides in terms of cost, capacity and complexity.

To put it simply; DRAM was always a cheaper type of memory used when doing it in SRAM was considered too expensive. An exotic and more expensive type of DRAM also has to justify the cost over more traditional DRAM the same way SRAM has to.

**sdack** · 15 July 2020, 04:11 PM

Originally posted by L_A_G View Post

General purpose CPU cores spend the vast majority of their time working out of registers and cache implemented in SRAM, which completely blows DRAM like HBM out of the water both in terms of latency and bandwidth.

No, you've got it backwards. CPUs need caches, because the memory cannot keep up. This is why you're seeing L1, L2 and L3 caches. The A64FX, which is just an Arm CPU, but one with 48 cores, no longer needs an L3 cache. Getting rid of the L3 is what allows the chip to have so many cores. The A64FX uses 10B transistors, which is about as many as the AMD Ryzen 9. Why bother with improving cache design when you can throw some of it out and be rid of it? So imagine a Ryzen 9 without needing an L3 cache and instead having more cores ...

**pal666** · 16 July 2020, 06:20 AM

Originally posted by wizard69 View Post

I know that DDR 5 ram has been sampling since almost the beginning of the year so maybe a real possibility of entering 2021 with much faster hardware.

ddr5 4800? you can have ddr4 4800 now, just with higher voltage. ddr5 will matter year-two later with 6400 or more

**pal666** · 16 July 2020, 06:26 AM

Originally posted by artivision View Post

People, ram price goes down half when the next model is out, example:

no, it actually goes up at first, and only with time it goes down

**pal666** · 16 July 2020, 06:30 AM

Originally posted by blueweb View Post

Practically all Intel CPUs are "APUs" since nearly forever.

they don't have proper gpu to depend on fast memory

**pal666** · 16 July 2020, 06:41 AM

Originally posted by wizard69 View Post

Well supposedly DDR5 will be faster when it first hits the market.

why they made such standard frequencies then? all ddrx were slow when they hit the market

Originally posted by wizard69 View Post

Say what you will about Apple but they have the balls to drop old tech

that's basically what we are saying: apple will screw all its sheep for few cents

**pal666** · 16 July 2020, 07:00 AM

Originally posted by L_A_G View Post

General purpose CPUs, working out of considerably faster SRAM-based cache memory for the vast majority of the time, are much more responsive to memory latency rather than bandwidth

i think it depends not on speed of cache, but on quality of prefetch and number of in-flight requests

**pal666** · 16 July 2020, 07:12 AM

Originally posted by sdack View Post

A single HBM stack with a 128-bit interface and operating at as little as 500 MHz moves 128 GB/sec, because there is no more latency associated with the memory. It uses a lower frequency, but it also uses a double data rate and so transfers 1 Gbit/sec (one bit on each flank) over a 128-bit wide channel. So the only latency you have is that between the CPU frequency and memory frequency.

that would make hbm latency lower than l3, which surely isn't the case

**torsionbar28** · 16 July 2020, 10:46 AM

Meh, my main rig is still DDR3 1866, and I'm happy with it. Maybe I'll upgrade in a few years.

Announcement

JEDEC Publishes DDR5 Standard - Launching At 4.8 Gbps, Better Power Efficiency

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment