Announcement

**rene** · 02 February 2019, 07:14 PM

Originally posted by HadrienG View Post

IIRC, SSE2 support is mandatory for AMD64 compliance. So any 64-bit system is guaranteed to have SSE support.

One small nitpick about this bit from the article:

It is actually surprisingly hard to get better performance with AVX than with SSE if your code is not a textbook use case for wide vectorization, due to the fact that on current intel CPUs, AVX mode takes a long time to turn on and to turn back off, and causes some significant downclocking which will cause an effective slowdown in surrounding scalar code and vector code that does not leverages the full vector width.

plus many CPUs don't turbo boost as much (clock lower) with AVX in use, …

**rene** · 02 February 2019, 07:15 PM

Originally posted by atomsymbol

In my opinion, it is improbable for MMX or x87 FPU to be retired from CPUs.

especially as it certainly consumes very few transistors, copared to all the SSE, AVX, caches, virtualisation, etc maybe time to start with something thought threw from scotch, we could call it something like, hmm, RISC, ... -V? ;-)

**edwaleni** · 02 February 2019, 08:18 PM

Which begs another question, do we still need real mode x86 support now that modern CPU's have VT-x since Westmere and x64 long mode? With 32 bit OS'es being retired, I would think virtual 8086 would be obsolete. x64-long mode supports the needed extended page tables.

**enihcam** · 02 February 2019, 08:34 PM

X86-16 needs to be retired completely.

**ssokolow** · 02 February 2019, 10:50 PM

Originally posted by enihcam View Post

X86-16 needs to be retired completely.

That depends on how you define "X86-16" and "retired completely".

Obviously, it's not practical to run a 16-bit OS on a machine where Intel has retired support for legacy BIOS APIs, but I still consider it a competitive advantage that you can run classic 16-bit Windows games like Lode Runner: The Legend Returns on 32-bit Wine on 64-bit Linux.

(ie. It's like retiring the ability to install a 32-bit kernel but not 32-bit multilib.)

**hotaru** · 03 February 2019, 12:01 AM

Originally posted by hreindl View Post

the question is how large is that codebase at all

64-bit code that uses MMX instead of SSE? probably extremely small, since SSE2 is mandatory for x86-64.

**ldesnogu** · 03 February 2019, 04:53 AM

Originally posted by HadrienG View Post

IIRC, SSE2 support is mandatory for AMD64 compliance. So any 64-bit system is guaranteed to have SSE support.

One small nitpick about this bit from the article:

It is actually surprisingly hard to get better performance with AVX than with SSE if your code is not a textbook use case for wide vectorization, due to the fact that on current intel CPUs, AVX mode takes a long time to turn on and to turn back off, and causes some significant downclocking which will cause an effective slowdown in surrounding scalar code and vector code that does not leverages the full vector width.

And to make things worse, AVX is not available on a non negligible amount of Intel CPU: Atom, Pentium and Celeron (yeah even those based on Core and released in 2019). Even though I can understand it for Atom (they wanted to keep things "small"), for Core-based Pentium and Celeron that just is silly market segmentation.

**coder** · 03 February 2019, 05:08 AM

Originally posted by hotaru View Post

64-bit code that uses MMX instead of SSE? probably extremely small, since SSE2 is mandatory for x86-64.

There's a lot of legacy code that was originally written for 32-bit, but then recompiled for 64-bit. Back in 2007, I wrote a bunch of MMX code that's still in use, today. The reason I didn't use SSE2 (and believe me - I wanted to) is that AMD CPUs didn't yet support SSE2. My employer wouldn't spend the ~$2.5k needed to upgrade the CPUs in other developers' dekstops that were using them.

**coder** · 03 February 2019, 05:14 AM

Originally posted by HadrienG View Post

IIRC, SSE2 support is mandatory for AMD64 compliance. So any 64-bit system is guaranteed to have SSE support.

I'm pretty sure Windows' 64-bit ABI doesn't support x87, BTW.

Originally posted by HadrienG View Post

AVX mode takes a long time to turn on and to turn back off, and causes some significant downclocking which will cause an effective slowdown in surrounding scalar code and vector code that does not leverages the full vector width.

I hate Intel for re-introducing the concept of modes, in AVX. That's one thing about MMX that truly sucked, and its absence made SSE so much nicer.

**coder** · 03 February 2019, 05:31 AM

Originally posted by atomsymbol

They are relatively efficient in my opinion. x87 FPU did not receive 128/256-bit vector registers which would make it able to compete with SSE/AVX in terms of performance. It is true that the x87 stack registers are less efficient compared to SSE/AVX non-stack registers and it is more complicated to extract instruction-level parallelism (ILP) from it compared to SSE/AVX.

The x87 registers and pipeline is natively 80-bit (with native support for denormals - not emulated, like SSE!). Windows ran it in 64-bit mode, but I read that Direct 3D v9 will set it to 32-bit mode and leave it there.

This makes for some interesting heisenbugs, when you're trying to debug some floating-point C code, and you insert a printf(). That might force the compiler to flush some fp stack contents that it was caching from < 80-bit fp variables in your code, thus changing the result. It's also a reason for seeing different results based on optimization flags, though there are plenty other reasons that would happen.

My team disabled x87 instruction generation, just for the benefit of improved stability in our test results.

Originally posted by atomsymbol

It was a mistake to alias the newer MMX 64-bit registers (mm0 to mm7) to the already existing x87 floating-point stack registers.

It was sure annoying, as well as what made MMX stateful. However, you're looking at the decision from a much different era. Let's remember that the Pentium was just a 2-way superscalar, in-order core.

I am so glad that Intel reached much further, with SSE - and so closely on the heels of MMX, as well.

Announcement

GCC To Begin Implementing MMX Intrinsics With SSE Instructions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment