The ECC DDR4 RAM Overclocking Potential With AMD Threadripper On Linux
Written by Max E in Hardware on 23 December 2018 at 02:16 PM EST. 41 Comments
In this guest post by Phoronix reader Max E, he has shared with us his experience and Linux benchmarking results when overclocking ECC RAM on an AMD Threadripper box. The process ended up being surprisingly easy and his results are quite compelling. Thanks to Max for this guest post, which we happily accept on Phoronix that cover interesting technical topics.

I recently treated myself to a new home development workstation, using a Threadripper 2950x, an ASUS X399-A Prime, and 64 GiB of ECC RAM.

For a 24/7 high-uptime workstation with that much RAM, ECC is definitely a good idea. But Threadripper really likes memory frequency (since the Infinity Fabric is always locked to the same clock speed as the RAM) and ECC tends to be sold at relatively low clock speeds.

Wendell from Level1 Techs has reported good results overclocking ECC RAM, so I decided to imitate his example and report my results.

The biggest difference between an ECC DIMM and a regular DIMM is that the ECC DIMM has more memory chips installed on it, to hold the parity bits. However, the actual memory chips aren't necessarily different from those used on high-speed enthusiast RAM; they are often the same SKU, made in the same factory on the same assembly line, and they are often specified to run at much faster clocks than they would in an ECC application. So it's possible to overclock ECC memory up to a more reasonable speed without exceeding the design limits of the RAM chips.

I can confirm Wendell's findings; overclocking ECC is extremely easy even for a novice overclocker like me.

Results file is here; more detail and the most interesting results below.

DIMM Selection

Ryzen seems to have the best compatibility with Samsung B-Die. Something about the timings of that particular RAM chip seems to suit Ryzen's memory controller. So I made sure to pick an ECC DIMM based around B-Die.

Wendell used 8 GiB single-rank DIMMs. I wanted 16 GiB DIMMs, to leave room for a possible future upgrade to 128 GiB, so I had to go with dual-rank (meaning twice as many RAM chips are installed on the DIMM.) This will make it harder for me to overclock.

The DIMM I selected is Samsung M391A2K43BB1-CPB, which comes with a stock clock speed of 2133 Mhz and timings of 15-15-15-36. Wendell's single-rank DIMMs are clocked at 2666 Mhz, so I'm working at a disadvantage.

Memory Overclocking

I was able to easily achieve 2666 Mhz and 2933 Mhz just by picking those clock speeds in the BIOS. I left all memory timings to be automatically controlled by the BIOS. I noticed that once I went up to 2666 Mhz, the primary timings were automatically loosened to 16-15-15-36. I did not monitor the secondary or tertiary timings, but I assume some of those changed too.

At 3066 and 3200 Mhz, the machine would POST but would hang under load (although 3066 got a bit further than 3200 did.) I believe that 3066 or even 3200 could be made stable with some actual memory tuning, but I didn't feel like it.

Beyond 3200, the machine wouldn't POST, even with a bit more DRAM voltage applied (1.35v instead of 1.20.) It got stuck in a memory-training reset loop, and I had to clear CMOS to get it back. ASUS' USB settings backup feature is great for situations like this.

Key Findings

Memory Clocks Make a Difference

While some tests didn't care, and a small number of tests actually regressed, overall I was able to observe measurable and repeatable performance gains by overclocking the memory. And unlike enthusiast-level memory overclocking, because ECC DIMMS are sold with so much headroom, I did it without fiddling with timings.

Developers Should Be Using NUMA

If you don't have access to the Windows Ryzen Master software, you can put the CPU in NUMA mode by setting "DRAM Striping" to "CHANNEL" in BIOS.

Traditional code compilation workloads don't consist of a single multi-threaded process. Instead, they consist of multiple single-threaded processes. Because each thread will never access RAM owned by another thread, RAM striping is counterproductive and NUMA mode ends up being a perfect fit.

I think rustc might be a single multithreaded process, so perhaps rust compilation won't take to NUMA as well. I didn't test this.

OpenMP Workloads Hate NUMA, but Tuning Helps

Every OpenMP test I ran regressed quite badly when turning on NUMA mode. However, I was able to make up some of the difference by setting the OMP_PROC_BIND environment variable to true. Some workloads recovered all the way, while others recovered only a little bit.

Honestly, I don't see why the GNU OpenMP runtime can't automatically detect if it's running on a NUMA CPU and take appropriate action automatically. The whole point of OpenMP is that you can achieve decent scaling without worrying about minutiae. Here we see an easily-avoidable failure of OpenMP to do its job.

Final Overclock

Once I had settled on 2933 Mhz memory clocks and NUMA mode, I overclocked the CPU as well. I enabled Precision Boost Overdrive in BIOS, leaving the limits at "Auto," applied a -81.25 mV vcore offset to help it clock higher, made some airflow and fan curve tweaks, set OMP_PROC_BIND to true again, and re-ran all the benchmarks. That is the "pbo" configuration you see in the results. My current bottleneck is thermals, since the CPU will throttle at 67.8C. I could probably get higher sustained clocks by speeding up my fans even more, but I want the computer to stay relatively quiet even under full load. (Under the heaviest loads, it would thermal throttle even without the overclock, but the OC may help single-threaded workloads a bit, and the offset will help multi-threaded workloads.)

I tried two different methods of increasing the thermal throttle point. First I changed the "temperature control" in BIOS to 100. Default is 95, I think; this includes the 27.2 offset built into the CPU, so in theory it should move the throttling point from 67.8 to 72.8. However, while I could successfully use this BIOS setting to reduce the max temperature, indicating that the setting isn't completely broken, I wasn't able to use it to increase it. There appears to be some other limit elsewhere. The other thing I tried was reducing the SenseMI offset from 272 to 222, but that had no effect at all that I could see. I would like to see the BIOS updated to make this possible, as 67.8 seems like an awfully low temperature to throttle at, and I'd be perfectly happy with my load temps a little bit higher.

The CPU overclock/undervolt I ended up with does seem to help quite a bit in some workloads, including my intended use of compiling, but they don't completely dwarf the gains from memory overclocking, so the RAM tuning in this case was well worth doing.

Other Notes

For the record, ambient temps were between 17 and 21C when this was tested. Since some of the tests have a disk speed component to them, I should mention that the tests were all run off of a 1TB Samsung 970 EVO m.2 drive. I also have a 280 gig Optane 300p swap disk, but I doubt I did any swapping during these tests.

Thanks again to Max for sharing his research and results in this guest article.
Related News
Popular News This Week