Announcement

**newwen** · 06 January 2018, 11:01 AM

Originally posted by gens View Post

The biggest performance gain is from fetching the code into L1 (and/or whatever the internal instruction buffer is called), otherwise the stall would be much much bigger.
The instructions can still be decoded, that is the second part of the performance gain.
Third part is actually executing them.

Modern cpu's go: fetch -> decode -> execute -> maybe write back results.
The "fetch" part is potentially the heaviest.

But I asume that cache is usually filled in blocks. Consecutive instructions are probably already in the L1 cache and the fetch step only stores it in the instruction queue. Unless the branch jump is relatively big, instructions are usually already in L1. Otherwise, you’re right , that moving instructions from higher level caches or even worse, from dram, would be the biggest performance hit.

**gens** · 06 January 2018, 11:55 AM

Originally posted by newwen View Post

But I asume that cache is usually filled in blocks. Consecutive instructions are probably already in the L1 cache and the fetch step only stores it in the instruction queue. Unless the branch jump is relatively big, instructions are usually already in L1. Otherwise, you’re right , that moving instructions from higher level caches or even worse, from dram, would be the biggest performance hit.

Yes, in blocks of 64 bytes (usually, on amd64) called "cache lines".
https://stackoverflow.com/questions/...che-lines-work
Best research would be reading the intel and amd manuals, and scouring the internet critically.
Computer Architecture: A Quantitative Approach is a great book.

PS Mind you that 90% of software is shit. And that, thanks to intels lawyers, amd64 and sse and such instructions are looooooong (even 8 bytes a piece).

**dwagner** · 06 January 2018, 12:21 PM

Is it too much to ask _what_ the microcode intends to do, then?

It is a shame how processor vendors (and motherboard vendors alike) keep their customers uninformed about the changes they publish. This really has to end.

**Kraut** · 06 January 2018, 12:39 PM

Originally posted by gens View Post

Branch prediction per se is not the problem, from what i understand. Speculative execution is the problem. ...

Originally posted by gens View Post

Just to be clear, "branch prediction" is the mechanism that predicts whether a "branch" in code goes one way or the other.
Only the mechanism, nothing else.

Branch prediction, speculative execution and prefetching in general are all one big system. And it works all together to fight the limitation of memory latency!

The CPU creates a tree of possible paths. And to go more then one level of prediction depth, they have to do prefetches and calculations in most of this branch to get new prefetch targets!

**gens** · 06 January 2018, 01:22 PM

Originally posted by Kraut View Post

Branch prediction, speculative execution and prefetching in general are all one big system. And it works all together to fight the limitation of memory latency!

The CPU creates a tree of possible paths. And to go more then one level of prediction depth, they have to do prefetches and calculations in most of this branch to get new prefetch targets!

It could be so interweaved with everything else that there is no difference. Just as it could be a completely isolated from everything else. Only people who signed fat NDAs know.

Whatever may be the case; branch prediction is not that system, and the problem is in the "speculative execution" part of the "system". (technically not even there)

**oibaf** · 06 January 2018, 02:20 PM

The need for AMD microcode update is clarified here: https://access.redhat.com/articles/3311301

AMD Defaults:
Due to the differences in underlying hardware implementation, AMD X86 systems are not vulnerable to variant #3. The correct default values will be set on AMD hardware based on dynamic checks during the boot sequence.

pti 0 ibrs 0 ibpb 2 -> fix variant #1 #2 if the microcode update is applied
pti 0 ibrs 2 ibpb 1 -> fix variant #1 #2 on older processors that can disable indirect branch prediction without microcode updates

**L_A_G** · 06 January 2018, 02:32 PM

Originally posted by gens View Post

Branch prediction per se is not the problem, from what i understand. Speculative execution is the problem. As in the cpu executes code of one branch, and if it turns out it's the wrong branch the cpu backtracks.

Branch prediction used to be just to load code in L1 code cache.

Saying that branch prediction is not the issue is like saying that turbo's are not the issue when a flaw is found in their compressors (a turbo is essentially an exhaust driven compressor). Sure, filling the instruction part of the L1 cache helps, but like all caches those are filled in blocks and with the pipeline lengths we've been seeing since the early 2000s (the Prescott core, introduced by Intel back in 2004, for example had a 31 stage pipeline) speculative execution has been the main driver of per-thread speedup in non-math tasks.

**audir8** · 06 January 2018, 03:10 PM

Speculative execution is a feature of out-of-order CPUs by definition. An in order CPU, like the 1/2nd gen Atom will still try to predict what is coming after a jmp instruction. In the simplest case, it will predict there will be no jump, and just fetch the next instruction. If it's wrong, it will have to fetch what's really next from DRAM. The next step in better branch prediction is adding a simple counter: https://en.wikipedia.org/wiki/Branch...nch_prediction

But, yes, it is all tied together in a modern CPU.

**wizard69** · 06 January 2018, 04:28 PM

Originally posted by dwagner View Post

Is it too much to ask _what_ the microcode intends to do, then?

It is a shame how processor vendors (and motherboard vendors alike) keep their customers uninformed about the changes they publish. This really has to end.

you seem to be making an assumption here that isnt judtified. AMD could have very well explained the update to Suse and Suse got the communications wrong. Im betting on Suse screwing up here because the text they posted should havè set off alarms in the head of anybody familiar with modern CPUs.

To put it another way AMD would not be inclined to release a microcode update that destroys performance the way this one would if it actually did what Suse says.

**wizard69** · 06 January 2018, 04:40 PM

Originally posted by oibaf View Post

The need for AMD microcode update is clarified here: https://access.redhat.com/articles/3311301

Sounds like a very specific fix to a problem with indirect branch prediction. The question then becomes is it completely disabled or in some way fixed. Also if disabled how much of a hit does this specific fix take. It sounds like a very limited adjustment to what can be predicted thus maybe not a big hit to perfotmance.

Announcement

AMD Did NOT Disable Branch Prediction With A Zen Microcode Update

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment