Announcement

**numacross** · 21 March 2022, 03:18 PM

Originally posted by dragorth View Post

At a guess, since the consumer version of this cache doesn't allow voltage manipulation, this generation of this new cache architecture has forced AMD to lock in the voltages for these CPUs. So don't expect it, and be pleasantly surprised if somehow it does.

I don't think EPYCs ever allowed direct voltage manipulation. It shouldn't be that hard to program the SMU to restore "normal" voltage settings if 3D V-cache is disabled, but we'll have to see.

**petko** · 21 March 2022, 04:37 PM

I'm curious would 3D V Cache increase the data transfer rates to GPU in GPU-enabled workloads?

**jaxa** · 21 March 2022, 06:17 PM

Originally posted by ddriver View Post

Waiting for amd to stack another 4-8 gb of L4 cache onto the io die as well.

Me too, but I was waiting for that before 3D V-Cache was announced. Now that they are bulking up L3, probably on more than one chip with the Zen 4 desktop lineup, I think they will neglect to add L4 to consumer chips.

But on a $5k+ Epyc CPU, slapping 8-24 GiB of HBM on the I/O die as L4 cache seems like a no-brainer, even with the 3D V-Cache.

Originally posted by petko View Post

I'm curious would 3D V Cache increase the data transfer rates to GPU in GPU-enabled workloads?

It could help GPUs. AMD is rumored to be adding 3D Infinity Cache to RDNA3 GPUs, after all.

**PerformanceExpert** · 21 March 2022, 07:11 PM

Originally posted by Oppenheimer View Post

Impressive numbers. My only query is how comparable are the results of vanilla Milan and 'MilanX with cache disabled' when comparing otherwise equivalent CPUs?

Indeed - besides the difference in base frequency, there is also an additional latency of 3-4 cycles due to the much larger cache. So the results on the first few pages overestimate the actual gains.

**msroadkill612** · 21 March 2022, 08:55 PM

Originally posted by petko View Post

I'm curious would 3D V Cache increase the data transfer rates to GPU in GPU-enabled workloads?

The bottle neck by far would be pcie? No?

**coder** · 21 March 2022, 09:11 PM

AMD had only publicly cited a few select workloads of popular commercial applications. After testing, it's great to see a fairly wide range of HPC workloads benefiting from this large L3 cache

Michael, thanks for the benchmarks, but I really wish you could be more transparent about your selection criteria for the benchmarks featured in these sorts of articles. It could lend a lot of credibility to the results, if we knew you weren't cherry-picking the tests you thought most likely to show strong wins.

**coder** · 21 March 2022, 09:23 PM

Originally posted by Oppenheimer View Post

Impressive numbers. My only query is how comparable are the results of vanilla Milan and 'MilanX with cache disabled' when comparing otherwise equivalent CPUs?

It would be an artificial experiment (which is not to say it's not worth doing), because the 3D cache models differ in base & turbo clocks. However, the latter half of the article does compare against its nearest competitor: the 7763 in both single & dual-CPU configurations.

And note that in at least one case (Timed Kernel Compilation) where dual-7773X lost to dual-7763, the single-CPU configuration better leveraged the strengths of the 7773X to let it eke out a win.

Indeed, the single-CPU results make 3D cache even more of a slam dunk, for 1P configurations.

**Linuxxx** · 22 March 2022, 12:15 AM

Really appreciate that Michael tested all CPUs with the performance governor, so that Linux's subpar default schedutil governor doesn't impair any of the results with less-than-stellar "clever" clockspeed decisions.

That's the spirit and how everyone should run Linux to get the most [...], well, performance out of their systems!

(I already pity those who buy these "arm&leg" expensive workstations, only to then achieve subpar ROI because of a poor governor choice...)

**arQon** · 22 March 2022, 02:09 AM

Originally posted by pentaprism View Post

Considering nothing is actually optimized for this yet

No, that's not how it works. If you develop high-performance systems, either you're already paying attention to cache behavior or you aren't actually developing a high-performance system. Going out of L*1* is something you already optimize for, and although L3 is certainly still very important, since after that you're going all the way to RAM, you don't - and can't sanely - "optimize for that".

I get the impression you think this is like AVX etc, where a -march will magically make things better if you're lucky, and it just isn't. There are, maybe, a few pathological cases where you could optimize for a known-large L3 by keeping a very small amount of extra information in a data structure that saves you a dereference or some trivial math, etc, but even so it would nearly always implicitly be the wrong choice *even if* you knew you would only be running on this specific CPU, because if there was room for it in the cache line in the first place it would be in there already for the L1 case, which is the one that matters.

Don't get me wrong: this is great stuff for even the "common" case as it is. It's even better for stuff that thrashes large sets of pages around with some degree of locality, like databases, statistics, and so on. But you have unrealistic expectations of what benefit it can and can't provide beyond that, or what fraction of software could even attempt to leverage it at the code level without actually just making things worse instead.

**arQon** · 22 March 2022, 02:24 AM

Originally posted by Linuxxx View Post

subpar default schedutil governor

I didn't realise that was the default these days. Given that schedutil has *always* sucked, that seems a bit odd. Are you sure it's not just "my distro has schedutil as the default because it's made by magpies"?

> (I already pity those who buy these "arm&leg" expensive workstations, only to then achieve subpar ROI because of a poor governor choice...)

meh - half the people who buy that sort of kit do so for vanity, not performance. The ones who buy workstations because they DO need them will also have either an IT team, or a VAR, or the knowledge themselves to make sure they run properly.

Announcement

AMD EPYC 7773X "Milan-X" Benchmarks Show Very Strong HPC Performance Upgrade

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment