Originally posted by birdie
View Post
Announcement
Collapse
No announcement yet.
AMD Ryzen 7 5800X3D Continues Showing Much Potential For 3D V-Cache In Technical Computing
Collapse
X
-
Originally posted by atomsymbol View PostWell, but you didn't account for the cost of increasing the number of pins on the CPU which would be required by a quad-channel AM4 socket (all AM4 motherboards, all AM4 CPUs).
----
I don't know the purpose of the 387 extra pins on the AM5 socket (1718 pins) compared to AM4 (1331 pins), considering that AM5 is rumoured to support DDR5 only (no support for DDR4).
I don't think the pins for 2 extra channels would make that big of a difference in socket cost; they would cost less than V-cache, anyway. Bear in mind that for first-gen Threadrippers, those sockets were huge and nearly half of the socket was rendered useless. When looking at this diagram, I assume the dark blue pins and some of the pink ones are for RAM:
https://www.docdroid.net/6cDW11N/am4-pinout-diagram-pdf
Collectively, those seem to take up about 1/4 of the package. So, maybe about 150 pins per RAM slot. That's a lot, but nothing too crazy, and roughly half of that is I think just power delivery.
In any case, triple or quad channel is unlikely to happen for AM5. Even if you ignore the CPU, the iGPU could really use the bandwidth.
Comment
-
Originally posted by agd5f View Post
Also an increase in die size for the extra memory PHYs and data fabric routing and reworked packaging. AMD does make a quad memory channel CPU, it's called Threadripper. If you want to look at cost, using the Threadripper socket and package is probably the least costly because it already exists.
- Likes 1
Comment
-
Originally posted by schmidtbag View Posthttps://www.docdroid.net/6cDW11N/am4-pinout-diagram-pdf
Collectively, those seem to take up about 1/4 of the package. So, maybe about 150 pins per RAM slot. That's a lot, but nothing too crazy, and roughly half of that is I think just power delivery.
In any case, triple or quad channel is unlikely to happen for AM5. Even if you ignore the CPU, the iGPU could really use the bandwidth.
wikichip:socket_am5#Pin_Description
AM5 seems to support 2 NVMe devices (2 * PCIe-x4), instead of 1 on AM4. GPU connectivity of AM5 seems to be the same as AM4.
Comment
-
Originally posted by piotrj3 View Post
Half yes half no.
Triple/Quad channel controllers help you with bandwidth so triple/quad channel memory controller could help you a lot with something like LZ4, that gets no performance bump from 3d cache. It also to spread more workload evenly to more sticks.
However, triple/quad channel won't remove bottleneck of latency. Think from this perspective, if you can clock memory to same speed/timings on triple/quad channel as dual channel that is at best 50/100% performance increase. And in fact, that is most optimistic performance increase you can get. Meanwhile in ZSTD you have 177% performance increase.
LZ4 like core speed. It's not much bound by main memory BW nor L3 BW. So, L1 and IPC make LZ4 happy.
Triple/quad channel will lower *average* latency, but not pointer chasing single threaded latency. I'm not aware of a benchmark that really can tell the difference.
- Likes 1
Comment
-
Originally posted by willmore View PostTriple/quad channel will lower *average* latency, but not pointer chasing single threaded latency. I'm not aware of a benchmark that really can tell the difference.
Comment
-
Originally posted by schmidtbag View Post
In any case, triple or quad channel is unlikely to happen for AM5. Even if you ignore the CPU, the iGPU could really use the bandwidth.
That being said, consumer desktops dont really need strong IGPs, as the advantages (lower power, vram size, faster transfer, more compact) don't matter much there.
What *does* need it is consumer + professional laptops. AMD/Intel should absolutely make a 4+ channel laptop platform.Last edited by brucethemoose; 02 May 2022, 07:39 PM.
- Likes 1
Comment
-
Originally posted by atomsymbol View Post
If it includes synthetic benchmarks, I suppose one could create a fine-tuned benchmark that chases multiple pointers concurrently in a single thread. A CPU capable of executing 3+ loads per cycle would be required (I do not own such a CPU yet) to show a measurable difference with a 4-channel RAM. With AMD Zen 3, there is a small performance advantage when the AM4 motherboard has 4 memory modules installed instead of 2, although both of these configurations are still dual-channel DDR4 - I am not sure whether this applies to just single-threaded code, to just multi-threaded code, or to both.
Also there are real benchmarks on intel 7820X working with 1, 2 and 4 sticks of ram (it supports up to quad channel, but can work in single channel as well)
(polish website, so you might want to autotranslate it: https://www.purepc.pl/test-pamieci-r...nnel?page=0,14 )
- Likes 2
Comment
-
Originally posted by piotrj3 View Post(polish website, so you might want to autotranslate it: https://www.purepc.pl/test-pamieci-r...nnel?page=0,14 )
- Likes 1
Comment
-
Originally posted by schmidtbag View PostI've considered this issue, but it depends on your workload. I was thinking that maybe for command rates configured to T2, the pairs of channels could maybe even work asynchronously to reduce latency. I imagine that is very complicated, and could potentially interfere with threads that need more RAM than what a pair of channels has to offer.
In any case, more channels can improve performance for a minimal increase in cost. Bigger caches in a lot of cases cost a lot and sometimes yield no benefit.
That's assuming all boards include all 4 channels. I'm sure most ITX boards would either stick with 2 slots, or go with SO-DIMMs. Budget boards won't need 4 channels. I think paying 10 extra is well worth the performance gains, when you consider the cache costs a hell of a lot more than that.
https://www.techpowerup.com/forums/t...aida64.263929/
Literally smallest latency (on Ryzens) you can practically gain on daily drive with quite strong XMP profile is around 60ns, and if you want absolute stability, more like 80ns. Meanwhile L3 cache has around 7-10 times lower latency. And you can't do anything about it really, as you can see on leaderboards, latency didn't change much since DDR2! times. Everytime CPU fails to predict what needs to be in cache, it needs to wait until it gets information from RAM.
Let's imagine you make simple function call that is in pretty far away land. Functions does something very simple (let's say 8ns). now if CPU fails to predict what is supposed to be come, but CPU has that in L3 cache, you will wait additional 8ns for that cache, so everytime L3 cache call happens performance on such function drops by half.
Now imagine thing wasn't in L3 cache, and we need to go... to RAM. now things take 64ns, so now we need to wait 8 times longer. Performance drops by 9 times. This is primary advantage of 3d cache, because cache is larger, there is much higher chance what you need is in cache, and if it is in cache you most of time don't pay such big costs, This is why in general performance increase by itself isn't that great, like most games show in cpu bound scenarios maybe 10% performance improvement to 5800X. But if you look at worst case scanarios (1% and 0.1% lows) 5800X3D isn't 10% ahead, it is often more like 30% ahead. This is why it is often called gamer cpu, because gamers more care about consistent gameplay then few higher average FPS. Meanwhile from polish benchmarking website i shown on 7820X in most games quad channel doesn't provide big benefits over dual channel neither in average neither in lows. In nutshell cache does something quad channel can only wish to achieve.
If you want to truly reduce latency CPU <-> RAM we would have to move towards many smaller channels and soldered RAM very close to CPU and RAM being optimized in latency not speed.Last edited by piotrj3; 02 May 2022, 09:18 PM.
- Likes 1
Comment
Comment