Announcement

**dibal** · 03 November 2018, 05:40 AM

Originally posted by TemplarGR View Post

SMT is vastly overrated. It adds only a tiny percentage of extra performance at certain workloads IF the code is multithreaded. At best you are going to get an 20% extra average multithreaded performance boost. Most of the time it hardly makes a difference, especially for modern quad+ desktop systems. In some instances it even LOWERS performance compared to having it off...

Compiling Stuff with SMT on an Ryzen gives me a Speedbump Factor of 1.33, you should check your Environment where you got your Numbers from.

**Tomin** · 03 November 2018, 05:45 AM

There is probably more potential in Zen to gain from SMT than in Intel architectures due to being able to compute more instructions in parallel (wider cores). At least that's what Agner seems to think in his optimization guide (read 20.21). There could be other reasons why Intel's implementation is better though (they've developed it much longer).

**numacross** · 03 November 2018, 06:13 AM

Originally posted by TemplarGR View Post

Yes. Indeed. I am an actual developer. Your post is incompetent. For the very simple reason that you people just choose to insult others instead of providing solid arguments.

The only "argument" someone posted in favour of SMT in this thread is "you are losing HALF YOUR PERFORMANCE with it off". Which is a stupid argument backed up with no facts. I have witnessed countless benchmarks of SMT and to my eyes is of no tangible benefit for desktop users. If you are a server user or have special needs then the situation changes but it seems most casuals incorrectly think SMT is a magic button that provides "moar perfromance" out of thin air.

It would be nice if Michael did a thorough SMT benchmark to demonstrate this issue. I suspect that in certain benchmarks it will look like a huge benefit and in others it will be of little use... And guess what benchmarks will affect desktop users more...

A quick 7-zip 18.05 x64 on Windows with dictionary size: 32MB. The CPU is a heavily overclocked Haswell, so not even the best Intel HT around.

Threads: 8 (4 of which are HT)
Total MIPS: 31437
Total Rating: 730%

Threads: 4
Total MIPS: 21188
Total Rating: 381%

As others have stated before, HT effectiveness depends heavily on what is actually executing. A localized tight-loop integer benchmark like 7-zip can utilize it almost fully. In fact the decompression part was very close to 800% rating.

Another example is the benchmark in CPU-Z, which I suspect is float-heavy:

8 threads gives ratio 5.14 while 4 threads are 3.73. Interestingly even when I pinned 4 physical cores to it the result didn't change meaning Windows automatically prioritises real cores.

**pmorph** · 03 November 2018, 07:47 AM

Originally posted by deant View Post

50 years later and we still cant make cpu w/o bugs...

With performance levels from 50 years back, yes we can. Apples to oranges is a better match for comparison than CPUs 50 years apart.

**Weasel** · 03 November 2018, 08:20 AM

Originally posted by TemplarGR View Post

SMT uses idle parts of a core's pipeline in order to brute force the execution of more threads but this costs energy consumption that brings clocks down. By removing SMT the idling of the pipeline helps increase clocks a little bit more. Especially on modern cpus that use turbo/boost features...

Doubling the performance using double the frequency requires 4 times the power.

Doubling the performance with SMT by utilizing unused execution ports requires 2 times the power (assuming it has enough idle ports, then it won't waste that much either, so it still scales almost linearly with some slight overhead for management).

You pick the one which is more efficient.

**Weasel** · 03 November 2018, 08:24 AM

Originally posted by TemplarGR View Post

As i said, SMT benefit depends on the workload, but even the absolute super best best of the best scenario won't give you more than 50-60% performance. And these scenarios are mostly achieved in synthetic benchmarks.

No, you are wrong, both theoretically and practically.

Theoretically the performance can be literally double, as long as there's enough unused execution ports. That's the "best case" obviously which doesn't happen that much in practice.

Practically, any workload that's latency-bound which ends up not saturating the execution ports will make SMT/HT very appealing, if you can run multiple of the same job in parallel on different data, for instance. It will easily increase your performance by 80% or more since it's like having twice the amount of cores.

Sure, you can make the pipeline smaller and have twice the cores and not need HT. But then your single-threaded performance will suffer as a consequence. Remember that CPUs are general purpose. Sometimes you need single-threaded performance, sometimes you are latency bound but can do stuff in parallel (unrelated to each other), sometimes you are massively parallel, etc.

You don't want to switch the CPU everytime you change your workload.

**dungeon** · 03 November 2018, 09:12 AM

Maybe it is time for new benchmarks... HT enabled VS HT disabled, starting with Athlon 200GE and how it scales up in percentage

**duby229** · 03 November 2018, 09:31 AM

Originally posted by chithanh View Post

The researchers seem to think that SMT itself is the problem here, and that the only way way to make it secure is to get rid of it.

Security
Performance

Choose one.

Someone over on HN commented that probably, SMT will have to become opt-in per-process.

Well, now that is interesting and could present some performance optimization potential. I think actually an opt-in SMT is probably a really good idea.

EDIT: I guess the way I'm interpreting this is that SMT can have better inter-process latency due to caching infrastructure on core. But SMP processes can have better aggregate bandwidth.

**ermo** · 03 November 2018, 09:33 AM

I have always been of the impression that software development (using e.g. a source code -> compile -> package -> install -> test cycle) was one of the cases where HT is a huge boon if the build process can be structured in parallel jobs and the compression software can make use of the extra threads?

Since I tend to re-purpose older hardware to become Linux systems, I generally tend to buy SMT CPUs from the outset. I'm not a developer per se, but I do dabble in packaging on Solus/funtoo/Exherbo. So the up front extra cost for SMT is amortized over the lifetime of the CPU in my case.

**lsatenstein** · 03 November 2018, 02:47 PM

In the old days, we were more concerned about the cost of virtual programming (dat translation). We had options to run V=R. or run in real mode and not with address translation.

Hopefully, ram will drop in price and increase in package density to where 32gig to 64gig home desktops are more common.
With 32gigs, there is not a software that I use with Linux that needs to run with virtual translation. I would much prefer to run Virtual=Real.

What is the overhead cost to maintain the V=R as the standard interface? In Linux, is an application able to pagefix it's memory so as to avoid dynamic address translation?

Announcement

PortSmash: A New Side-Channel Vulnerability Affecting SMT/HT Processors (CVE-2018-5407)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment