Originally posted by BillBroadley
View Post
Announcement
Collapse
No announcement yet.
AMD Ryzen 9 3900X Linux Memory Scaling Performance
Collapse
X
-
Originally posted by shmerl View Post
So, games can still benefit from using 3600 MHz / CAS16 rather than 3200 MHz / CAS14? Games usually are RAM bandwidth sensitive.
- Likes 5
Comment
-
Originally posted by existensil View Post... The regression isn't related to increased latency in memory above that speed, but is instead caused by the dramatic reduction in the speed of the infinity fabric above DDR4-3600. From Anandtech:
The big reduction in performance with Apache Siege leads me to believe that benchmark involves a heavy amount of cross-core communication and is saturating the infinity fabric, especially when it's nearly halved in speed.
I believe you are spot on with the Apache benchmark. Given the way Apache spawns workers it's quite possible that benchmark is both spilling threads to the other CCD, as well as also thrashing the cache from the parent thread back on the other die.
The original source of the AMD RAM and Fabric upgrades is here: https://www.techpowerup.com/img/dF4sjxFh6HNk7GXn.jpg
Interestingly @BeardedHardware on Youtube (https://www.youtube.com/channel/UCHc...zq231nAS5zFmsw) tested a lot with DDR4 overclocking and the fabric speed on Ryzen 7 3700X yesterday and at least with the MSI X570 board he had, there was a "disable gear down" option that let him overclock the Infinity Fabric to 1900Mhz, therefore running 1:1 with DDR4-3800. But it sounds like stability falls off a cliff right around there, and some chip's silicon lottery might only do 1800 or 1866 fabric clock. With his overclocks he was seeing 64ns RAM latency with DDR4-3800 CL15 and 3733 CL14.
- Likes 5
Comment
-
Originally posted by shmerl View PostSo you used the same memory kit running it at different frequencies? I don't think this is very informative. Since different RAM can also have different latency for different frequencies.
I.e. let's say you have 3200 MHz dual channel RAM with 14 CAS latency, and 3600 MHz one with 16 CAS.
So timing usually is calculated as CL / (single channel frequency) * 1000.
I.e.:
14 / 1600 * 1000 = 8.75 ns.
16 / 1800 * 1000 ≈ 8.89 ns.
So If I understand it correctly, 3200 MHz RAM with 14 CAS latency should perform better than 3600 MHz one with 16 CAS. Though I've never tested that, would be interesting to confirm.
One clock cycle at 1600MHz is 0.625ns and at 1800MHz is 0.556ns. This would then make the 16/1800 example catch up to the 14/1600 one if a second clock cycle is needed to satisfy whatever the CPU wanted from the memory.
I don't know how often it happens that the CPU is satisfied after just one clock cycle and how often it wants more from the memory. One clock cycle means 16 bytes of data, so that's already quite a bit.
About using the same memory kit at different speeds, the motherboard is likely scaling the timings from the XMP profile. You don't need different kits for experimenting, the board's BIOS will already do exactly that calculation you did and will turn the 14 CL at 3200MHz timing into a 16 CL timing at 3600MHz. EDIT: I checked again on my current motherboard, and my current motherboard is not doing what I describe here. It is only scaling tRFC and tREFI when I change the speed. It's perhaps likely that I'm remembering wrong how the old motherboard behaved where I got this idea about the motherboard scaling the timings automatically.Last edited by Ropid; 10 July 2019, 09:46 PM.
- Likes 3
Comment
-
To get the full picture we need to know if the cache line length is still 32 bytes or something else. If you use only 1 byte out of the fetch the penality is worst and latency more important.
Maybe someone can write a straddled loop benchmark. While at it it could be fun to check if the L3 size is the main reason for the real performance gain or is it pure IPC gain.
- Likes 1
Comment
Comment