Announcement

**torsionbar28** · 04 April 2021, 02:57 PM

Originally posted by numacross View Post

Xeons were available in boxed versions since their beginning

And even their newest versions share the same design, which I find neat:

It is a striking visual design, isn't it? Those rays or beams emanating from within the chip are eye catching. They appear to represent the number of hardware security vulnerabilities baked into Xeon's silicon.

**smotad** · 04 April 2021, 04:07 PM

Originally posted by muncrief View Post

Well, if PSF is that ineffective it's currently a waste of silicon space, and AMD should either improve it or remove it. I can't help but wonder if something wasn't setup correctly for these tests, or if there's some other unintended anomaly, but if it's really this bad PSF doesn't seem to be add any real value to a processor.

Another question that comes to mind is: If PSF is used in very concrete workloads, is it really a target for exploits?

**microcode** · 04 April 2021, 04:18 PM

I kinda wonder if the silicon spent on this could be spent on fattening some traces to increase stability at higher frequencies.

**coder** · 04 April 2021, 10:56 PM

Originally posted by zeb_ View Post

One can reverse the question: what is the interest of enabling PFS if it does not provide an advantage?

Yeah, Michael which benchmarks showed the greatest benefit, and by how much?

I also wonder whether it's a scenario more likely to occur in poorly-optimized (or unoptimized) code, since it seems to describe a situation where you store something and then read it back before it even hits L1 cache. Normally, an optimizing compiler would cache such values in registers, if they would be needed again, so soon. Maybe it would also help with optimized code, if you're spilling registers and the core's write buffers are full.

All of this has me wondering something else. When you get a cache miss, a cacheline has to be evicted before the new one can be fetched. So, I wonder if modern CPUs use idle cycles on the memory bus to pre-emptively write back the cachelines most likely to be victimized. That would at least lower the penalty of a cache miss, somewhat.

**coder** · 04 April 2021, 11:10 PM

Originally posted by smitty3268 View Post

It seems like it's likely a feature that becomes more effective in longer running processes. A short benchmark might not be affected nearly as much as a long-running server process.

It's not a bad question, but the second of the quotes you included confirms what I'd assumed -- that the profiling data is actually very short-lived!

All sorts of internal CPU state, like this, get implicitly or explicitly replaced, once a context-switch happens (most notably, things like branch-prediction, which is very consequential). Context-switches can occur anywhere from a few times per second (per core) to a hundreds of thousands (see https://openbenchmarking.org/test/pts/stress-ng-1.3.1 and note that it's measuring context switches on all cores), with the latter happening if the thread is actively doing things like blocking I/O or synchronization.

However, the limiting factor on its window of applicability is more typically going to be the relatively small number of addresses it has the storage to cache (and the ability to lookup). It could be as few as a couple dozen, but it's almost certainly not enough to help outside of a small-to-medium size loop.

**coder** · 04 April 2021, 11:14 PM

Originally posted by muncrief View Post

I can't help but wonder if something wasn't setup correctly for these tests, or if there's some other unintended anomaly, but if it's really this bad PSF doesn't seem to be add any real value to a processor.

A way to know for sure would be to hand-code a test in asm and benchmark it with/without the feature enabled.

**BingoNightly** · 05 April 2021, 12:55 AM

Originally posted by zeb_ View Post

One can reverse the question: what is the interest of enabling PFS if it does not provide an advantage?

Marketing?

**halo9en** · 05 April 2021, 02:31 AM

Originally posted by milkylainen View Post

Phew. Dodged that one by a hair.
Nice to see that performance wasn't shot to bits.

Heh, just like Intel. Uh wait...

**Peter Fodrek** · 05 April 2021, 04:23 AM

Originally posted by muncrief View Post

Well, if PSF is that ineffective it's currently a waste of silicon space, and AMD should either improve it or remove it. I can't help but wonder if something wasn't set up correctly for these tests, or if there's some other unintended anomaly, but if it's really this bad PSF doesn't seem to add any real value to a processor.

This feaure mainly helps to speedup in multimedia and matrix operations/instruction

AMD Developers Looking At GNU C Library Platform Optimizations For Zen
on 25 March 2020
Stemming from Glibc semantics that effectively "cripple AMD" in just checking for Intel CPUs while AMD CPUs with Glibc are not even taking advantage of Haswell era CPU features,

Under a "request for comments" flag, patches tentatively posted add AMD Zen and AVX/AVX2 platform support and refactor the platform support within the CPU features detection. This would at run-time allow CPU features like AVX2, FMA, BMI2, POPCNT, and other instructions to be enabled when detected to be running on an AMD Zen based processor.

AMD Developers Looking At GNU C Library Platform Optimizations For Zen - Phoronix

https://www.phoronix.com/scan.php?page=news_item&px=GNU-libc-Platform-Optimize-Zen

Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

It "just" disbale speedup for Zen3 as of Matlab levels

Nov 18th, 2019 10:53
MATLAB is a popular math computing environment in use by engineering firms, universities, and other research institutes. Some of its operations can be made to leverage Intel MKL (Math Kernel Library), which is poorly optimized for, and notoriously slow on AMD Ryzen processors. Reddit user Nedflanders1976 devised a way to restore anywhere between 20 to 300 percent performance on Ryzen and Ryzen Threadripper processors, by forcing MATLAB to use advanced instruction-sets such as AVX2. By default, MKL queries your processor's vendor ID string, and if it sees anything other than "GenuineIntel...," it falls back to SSE, posing a significant performance disadvantage to "AuthenticAMD" Ryzen processors that have a full IA SSE4, AVX, and AVX2 implementation

MATLAB MKL Codepath Tweak Boosts AMD Ryzen MKL Performance Significantly

https://www.techpowerup.com/261241/matlab-mkl-codepath-tweak-boosts-amd-ryzen-mkl-performance-significantly?cp=3

MATLAB is a popular math computing environment in use by engineering firms, universities, and other research institutes. Some of its operations can be made to leverage Intel MKL (Math Kernel Library), which is poorly optimized for, and notoriously slow on AMD Ryzen processors. Reddit user Nedflander...

for glibc 2.33 and 2.34 as of 2.34 will add AMD specific optimized code

GNU C Library 2.33 Released With HWCAPS To Load Optimized Libraries For Modern CPUs
on 1 February 2021

GNU C Library 2.33 Released With HWCAPS To Load Optimized Libraries For Modern CPUs - Phoronix

https://www.phoronix.com/scan.php?page=news_item&px=GNU-C-Library-2.33

Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

**coder** · 05 April 2021, 05:02 AM

Originally posted by BingoNightly View Post

Marketing?

If it was, I'd say that was a really bad call. Who had even heard of this feature, before now? I read a fair amount of the Zen3 launch coverage and don't remember even a mention of it!

I really doubt AMD is wasting time and money on low-level CPU features only for the sake of marketing. That's more like something Intel would do. Although, I'd say both companies do a certain amount of that, at the chipset & Windows driver-level.

Announcement

Benchmarking AMD Zen 3 With Predictive Store Forwarding Disabled

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment