Originally posted by sdack
View Post
Announcement
Collapse
No announcement yet.
JEDEC Publishes DDR5 Standard - Launching At 4.8 Gbps, Better Power Efficiency
Collapse
X
-
-
Originally posted by L_A_G View PostI explained the clear engineering reasons why things are done the way they are ...
Leave a comment:
-
Originally posted by sdack View PostNo, what you gave me is the view is somebody who doesn't want to leave his comfort zone, but hold on to his believes.
Don't worry, the change won't happen fast. You'll get enough time to adjust.
Leave a comment:
-
Originally posted by L_A_G View PostI gave you a detailed technical explanation ...
Don't worry, the change won't happen fast. You'll get enough time to adjust.
"640KB of RAM is enough. RISC is too primitive. Flash memory is too slow." ... It has people like you in every decade.Last edited by sdack; 16 July 2020, 03:29 PM.
Leave a comment:
-
Originally posted by sdack View PostNo, you only refuse to believe it, because it would blow your mind. You cannot imagine and therefore not accept how the industry could abandon an established design to replace it with a much simpler one and end up breaking down barriers. You're just stuck in your believes.
This is on the level of basic physics, engineering and even thermodynamics-defying crowdfunded crap like Fontus*, Waterseer** and Solar Roadways***
*A "self-filling water bottle" which is just a portable effect dehumidifier
**Same thing except as a bigger fixed installation which is literally an off-the-shelf dehumidifier with a filter due to how the water in those things is so filthy manufacturer expressly recommend against drinking it
***Probably best explained by Dave Jones
- Likes 1
Leave a comment:
-
Originally posted by L_A_G View PostThe reason why ...
Leave a comment:
-
Originally posted by sdack View PostYou have over-growing caches on CPU dies on one side, which could be used for more cores, and external storage such as NVMe SSDs approaching speeds of several GB/sec on the other side. Do give this a second thought. When you cannot see how that is a bad sign and that it is time for a new main memory design then you've wasted your time on your master.
When talking about latency for CPUs the metric used is cycles and when operating out of registers latency is effectively zero. Then when you move into caches you start from a few cycles to a couple of dozen cycles. Once you leave cache things get positively slow in a hurry. Even with TLBs for speeding up virtual memory access to DRAM is going to be in excess of 100 cycles and can be over 300 if you have a TLB miss. After that once you move into non-volatile memory things get very slow in a hurry even with fast flash memory. Out-of-order execution does exist to fill the scheduling "bubbles" that this causes, but even that relies on a very high cache hit-rate to function effectively.
To put it bluntly: What matters most to a general purpose CPU is how many cycles access takes and in that regard SRAM beats everything else hands down. DRAM is more or less an order of magnitude slower while non-volatile memory like flash is, even at the best of times, again an order of magnitude slower. How many GB/s a type of memory can pull of consecutively stored data (that's what the advertising figures are as it's when they're at their absolute best) doesn't really matter when a CPU will mostly be pulling 64k cache lines worth of data semi-randomly from back and forth across memory.
Modern CPU design has heavily utilized performance modelling with very accurate models and real-world representative workloads to inform design decisions ever since the research that lead to the first RISC CPUs back in the 1980s. Cache setups, not just in the sizes of different levels, but also level-specific setups (on Zen L2 has a complex 8 way set associative setup while L3 has a much simpler "victim cache" setup) are very heavily informed by modelling that allows the manufacturer to get the best performance for realistic workloads under the available silicon budget.
That may have been a bit bewildering to read, but I will point out that I have a degree in computer engineering, (again) wrote my thesis in scientific compute and it's pretty much what I work with day-to-day.
Leave a comment:
-
Originally posted by pal666 View Postthat would make hbm latency lower than l3, which surely isn't the case
Caches on the other hand need additional logic for their associativity and their validation process (miss'n'hit), which creates latency and you don't have this with HBM, because it is your main memory. HBM then needs to sit very close to the CPU die on an interposer and thereby alone becomes a very strong competitor to the on-die caches.
I would find it odd if they went for just 500MHz, but then introduced latency. So while I wish I could give you exact latency numbers, an affirmative nod or something or anything more specific, do I currently not believe that there is any latency to be found with HBM.Last edited by sdack; 16 July 2020, 01:08 PM.
Leave a comment:
-
Originally posted by L_A_G View PostAMD's most recent Threadripper and Epyc CPUs have as much as 256MB of L3 cache.Last edited by sdack; 16 July 2020, 12:17 PM.
Leave a comment:
-
Originally posted by sdack View PostNo, you've got it backwards. CPUs need caches, because the memory cannot keep up. This is why you're seeing L1, L2 and L3 caches. The A64FX, which is just an Arm CPU, but one with 48 cores, no longer needs an L3 cache. Getting rid of the L3 is what allows the chip to have so many cores. The A64FX uses 10B transistors, which is about as many as the AMD Ryzen 9. Why bother with improving cache design when you can throw some of it out and be rid of it? So imagine a Ryzen 9 without needing an L3 cache and instead having more cores ...
As for big L3 caches, they're not going anywhere in general purpose CPUs. AMD's most recent Threadripper and Epyc CPUs have as much as 256MB of L3 cache. While the A64FX is technically a general purpose CPU, it's actually meant for the kinds of compute jobs where you process large amounts of sequentially stored data and that can be done very well on GPUs, accelerator cards like Xeon Phi's and various kinds of MAC ASICs like currently in-vogue neural net accelerators. This hardware doesn't use big SRAM caches anyway so general purpose CPUs adapted to perform the same jobs aren't going to have them either.
What's good for hardware that does heavy compute jobs for things like physical simulations, neural net learning and processing, etc. isn't always going going to be any good for hardware that does every day computing. Unlike in the past when things like caches, instruction level parallelism, branch prediction, TLBs, vector instruction units and so on would eventually filter trough to consumer hardware and improve it, that's just not the case when compute hardware is improved by tailoring it for workloads that aren't representative of what consumer hardware spends it's days doing.
I did my master's thesis in scientific compute and wrote my project in CUDA on GPUs so I know how heavily tailored the code and the hardware is to achieve the kinds of results that people demand for these jobs today. A GPU is terrible for the kinds of things of things a general purpose CPU is used for and the same to an increasing extent applies to CPUs adapted to perform the same jobs as GPUs.
Leave a comment:
Leave a comment: