Announcement

**Weasel** · 17 December 2022, 03:55 PM

Originally posted by xfcemint View Post

Weasel, your entire post is just a big red herring (Wikipedia), and nothing more.
Why don't CPU vendors let users to benchmark SE vs non-SE? Then we would have exact numbers. Why all this muddling with dubious arguments?

Let us test it. Let us make our own conclusions. Give us more choice, give us more freedom.

No, choice and freedom comes at a cost in hardware. When you have to literally optimize it down to transistor levels, adding options would just hurt performance or waste precious die area. We're talking about something essential here to the underlying work of any thread on the CPU, not something fringe.

**intelfx** · 17 December 2022, 04:10 PM

Originally posted by Vorpal View Post

mm == memory management. It is a subsystem in the Linux kernel.

Don't reply if you don't have a clue.

In this context, "mm" refers to struct mm_struct, a fundamental object in the kernel that describes, roughly, a process address space. It is pretty clear from the context that "mm" is not used here in the sense of the respective subsystem.

**chithanh** · 17 December 2022, 07:26 PM

Originally posted by xfcemint View Post

In short, I'm complaining that CPUs with OoO should feature optional SE. OTOH, you are comparing OoO CPUs with non-OoO CPUs, which is completely unrelated and irrelevant.

While technically you are right that OoO and speculative execution are orthogonal concepts, in practice they are closely intertwined. With Meltdown, OoO plays a major role. The barriers (fences etc.) to mitigate such vulnerabilities need to limit both speculation and instruction reordering.

**PerformanceExpert** · 17 December 2022, 07:40 PM

Originally posted by xfcemint View Post

No, what you are saying is not true in this case.

The trick is that non-SE mode is so much simpler to implement than SE mode, so adding a non-SE mode doesn't really have any measurable impact on performance or complexity of CPUs.

Also, I wouldn't be surprised if all the modern CPUs are already capable of non-SE mode of operation, because that mode is very useful in debugging and testing the CPU during development. So, in modern CPUs, disabling SE probably boils down to just some changes in CPU microcode. I would bet that many modern CPUs can be put into a non-SE mode just by using appropriate microcode.

Basically OoOE makes speculation easy and lots of speculation is what makes OoO fast. You cannot have one without the other.

You can turn off branch prediction (one kind of speculation) in many CPUs, which results in such a slowdown that the CPU becomes unusable. Nobody sane would want to do that...

**AmericanLocomotive** · 17 December 2022, 08:41 PM

Processors have had speculative execution for a long time. Even the AMD k5 and Intel P6 (Pentium Pro) CPU cores back in 1995 had functioning speculative execution. CPU manufacturers are very concerned about silicon budgets. If SE did not provide useful performance benefits, it would not be used.

ARM, IBM, Intel, AMD, even MIPS all utilize speculative execution on their high end cores. If it wasn't worthwhile (or had huge security problems for a minor increase in performance), it wouldn't be used.

**PerformanceExpert** · 17 December 2022, 08:56 PM

Originally posted by xfcemint View Post

So say you. Why can't we test that theory of yours by a benchmark?

It's a fact you can test by switching off branch prediction.

Branch prediction is a kind of speculation, but it is not a kind of speculative execution. There is a significant difference. Speculative execution involves the execution stage of an instruction pipeline.

You can speculate as much as you want before an instruction enters the execution stage. That won't cause SPECTRE. SPECTRE can be caused only in the execution stage of the instruction pipeline.

So, please, don't confuse those terms again.

An OoO core executes branches speculatively - for example it may execute 10 loop iterations before finally confirming that the branch of the first iteration was correctly predicted.

Without speculation you cannot start executing instructions after a branch until it has been fully resolved. Even in-order cores speculate branches since not speculating would be too slow.

Hence the idea of removing all speculation is totally stupid.

**PerformanceExpert** · 17 December 2022, 09:42 PM

Originally posted by xfcemint View Post

Yep, the great corporations always know best

How are quotes from CEOs even remotely relevant when the subject is competitive CPU designs? A CPU is made by thousands of smart engineers, not by the CEO.

**PerformanceExpert** · 17 December 2022, 10:00 PM

Originally posted by xfcemint View Post

You mean, like Itanium?

You mean, like Pentium 4?

LOL - it's funny you mention 2 past designs that had serious issues with speculation - Itanium being in-order VLIW speculated way too little and Pentium 4 had a terrible replay mechanism resulting in slow recovery after incorrect speculation. Both were great CPUs for tasks that didn't benefit from speculation, P4 was amazing on non-branchy code and Itanium won most FP benchmarks. Both ran like a dog on complex branchy code.

**PerformanceExpert** · 17 December 2022, 10:27 PM

Originally posted by xfcemint View Post

I think that you are intentionally confusing the terms, since I have already told you:
branch prediction is not speculative execution.

Of course it is. You not only speculate the direction of each branch but also speculatively execute instructions after them. Even in-order cores do this.

I think it is stupid to claim that without any benchmarks. Theoretize blah blah blah....
I want a benchmark: speculative OoO vs. non-speculative OoO.

It's a fact that turning off speculation will result in worse performance than an in-order core. However if you don't want to accept knowledge from others then go and benchmark it yourself!

You can compare in-order (limited speculation) vs OoO cores (huge amount of speculation). You can compare with/without speculation past branches by turning off branch prediction. You can even insert fences after every instruction you want to stop speculating (eg. memory accesses). You will find that OoO without speculation is no longer OoO...

**PerformanceExpert** · 17 December 2022, 10:59 PM

Originally posted by xfcemint View Post

This is a red herring (Wikipedia).
My point is that those two designs were quite bad, as opposed to your argument of "flawless corporations".

Why exactly they failed - it is a complex issue, and not so easiliy related only to speculative execution (i.e. you are making a fallacy of incomplete comparison).

Double strawman fallacy detected! I didn't claim that corporations are totally flawless nor that speculative execution was the only issue.

Neither design was bad, they were successful and made money. And to give the engineers credit due, they were technologically advanced. The P4 dual pumped 8GHz ALU was amazing and not matched by any design ever since. Similarly Itanium implementations were bigger and wider than anything else on the market, the later generation 12 wide design (with OoO execution) is still wider than the widest OoO CPUs today (8 wide).

Announcement

Linus Torvalds Bashes Intel's LAM - Rejected For Linux 6.2

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment