Announcement

**milkylainen** · 04 April 2021, 12:26 PM

Phew. Dodged that one by a hair.
Nice to see that performance wasn't shot to bits.

**zeb_** · 04 April 2021, 12:51 PM

One can reverse the question: what is the interest of enabling PFS if it does not provide an advantage?

**Adarion** · 04 April 2021, 01:01 PM

Yeah, was wondering that, too. Either I have a mistake in thinking or PSF wouldn't really help all that much in most scenarios. Or, it still wasn't really operating / non-operating. Or the kernel paramter did something different. Maywe we need something that definitely calls SECCOMP and forces the kernel to run this code in safe modes?

**tildearrow** · 04 April 2021, 01:12 PM

AMD: in where the mitigation comes for free.

Also, a box for Epyc? Never before server processors had fancy boxes..

**numacross** · 04 April 2021, 01:18 PM

Originally posted by tildearrow View Post

Also, a box for Epyc? Never before server processors had fancy boxes..

Xeons were available in boxed versions since their beginning

And even their newest versions share the same design, which I find neat:

**angrypie** · 04 April 2021, 01:33 PM

Cue to birdie crying in fetal position after seeing those benchmarks.

**muncrief** · 04 April 2021, 02:05 PM

Well, if PSF is that ineffective it's currently a waste of silicon space, and AMD should either improve it or remove it. I can't help but wonder if something wasn't set up correctly for these tests, or if there's some other unintended anomaly, but if it's really this bad PSF doesn't seem to add any real value to a processor.

**birdie** · 04 April 2021, 02:20 PM

Originally posted by muncrief View Post

Well, if PSF is that ineffective it's currently a waste of silicon space, and AMD should either improve it or remove it. I can't help but wonder if something wasn't setup correctly for these tests, or if there's some other unintended anomaly, but if it's really this bad PSF doesn't seem to be add any real value to a processor.

Maybe it requires not yet enabled GCC optimizations?

**smitty3268** · 04 April 2021, 02:47 PM

It seems like it's likely a feature that becomes more effective in longer running processes. A short benchmark might not be affected nearly as much as a long-running server process.

PREDICTIVE STORE FORWARDING It is common for a CPU to execute a load instruction to an address that was recently written by a store. Many modern processors implement a technique known as Store-To-Load-Forwarding (STLF) to improve performance in such cases. With STLF, data from the store is forwarded directly to the load without having to wait for it to be written to memory. In a typical CPU, STLF occurs after the address of both the load and store are calculated and determined to match. PSF expands on this by speculating on the relationship between loads and stores without waiting for the address calculation to complete. With PSF, the CPU learns over time the relationship between loads and stores. If STLF typically occurs between a particular store and load, the CPU will remember this. When the CPU sees the store/load pair again, it may predict that STLF will occur and speculatively forward the data from the store to the load. This is done before confirming that the store and load are in fact to the same address.

The PSF is limited to training about store/load dependencies within the same context. A context is defined by the current values of CPL, ASID, PCID, CR3, and SMM status. Training only occurs if both the store and load execute in the same context. Any time that any piece of the context state changes (e.g. system call) existing training information is flushed. In particular, this flushing occurs on all far control transfers which includes all CPL changes, system call and return, interrupt/exceptions, SMM entry/exit, and VM entry/exit. Note that the PSF predictor is partitioned amongst SMT threads so the activity of one SMT thread does not influence the PSF predictions of the sibling thread. Finally, the store and load used to train the PSF must be relatively close together in the instruction stream and there cannot be any pipeline flushes (such as due to a mis-predicted branch) between the store and the load.

Announcement

Benchmarking AMD Zen 3 With Predictive Store Forwarding Disabled

Benchmarking AMD Zen 3 With Predictive Store Forwarding Disabled

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment