Originally posted by HadrienG
View Post
Announcement
Collapse
No announcement yet.
GCC To Begin Implementing MMX Intrinsics With SSE Instructions
Collapse
X
-
Originally posted by atomsymbol
In my opinion, it is improbable for MMX or x87 FPU to be retired from CPUs.
Comment
-
Which begs another question, do we still need real mode x86 support now that modern CPU's have VT-x since Westmere and x64 long mode? With 32 bit OS'es being retired, I would think virtual 8086 would be obsolete. x64-long mode supports the needed extended page tables.
- Likes 3
Comment
-
Originally posted by enihcam View PostX86-16 needs to be retired completely.
Obviously, it's not practical to run a 16-bit OS on a machine where Intel has retired support for legacy BIOS APIs, but I still consider it a competitive advantage that you can run classic 16-bit Windows games like Lode Runner: The Legend Returns on 32-bit Wine on 64-bit Linux.
(ie. It's like retiring the ability to install a 32-bit kernel but not 32-bit multilib.)
- Likes 1
Comment
-
Originally posted by HadrienG View PostIIRC, SSE2 support is mandatory for AMD64 compliance. So any 64-bit system is guaranteed to have SSE support.
One small nitpick about this bit from the article:
It is actually surprisingly hard to get better performance with AVX than with SSE if your code is not a textbook use case for wide vectorization, due to the fact that on current intel CPUs, AVX mode takes a long time to turn on and to turn back off, and causes some significant downclocking which will cause an effective slowdown in surrounding scalar code and vector code that does not leverages the full vector width.
- Likes 6
Comment
-
Originally posted by hotaru View Post64-bit code that uses MMX instead of SSE? probably extremely small, since SSE2 is mandatory for x86-64.
- Likes 3
Comment
-
Originally posted by HadrienG View PostIIRC, SSE2 support is mandatory for AMD64 compliance. So any 64-bit system is guaranteed to have SSE support.
Originally posted by HadrienG View PostAVX mode takes a long time to turn on and to turn back off, and causes some significant downclocking which will cause an effective slowdown in surrounding scalar code and vector code that does not leverages the full vector width.
- Likes 1
Comment
-
Originally posted by atomsymbolThey are relatively efficient in my opinion. x87 FPU did not receive 128/256-bit vector registers which would make it able to compete with SSE/AVX in terms of performance. It is true that the x87 stack registers are less efficient compared to SSE/AVX non-stack registers and it is more complicated to extract instruction-level parallelism (ILP) from it compared to SSE/AVX.
This makes for some interesting heisenbugs, when you're trying to debug some floating-point C code, and you insert a printf(). That might force the compiler to flush some fp stack contents that it was caching from < 80-bit fp variables in your code, thus changing the result. It's also a reason for seeing different results based on optimization flags, though there are plenty other reasons that would happen.
My team disabled x87 instruction generation, just for the benefit of improved stability in our test results.
Originally posted by atomsymbolIt was a mistake to alias the newer MMX 64-bit registers (mm0 to mm7) to the already existing x87 floating-point stack registers.
I am so glad that Intel reached much further, with SSE - and so closely on the heels of MMX, as well.
- Likes 1
Comment
Comment