Announcement

**caligula** · 06 June 2018, 01:31 PM

Originally posted by Weasel View Post

Horizontal ops are lame because they are micro-coded and don't offer any performance benefit, just smaller code (compared to doing it "manually"). I think he's referring to *integer* SIMD operations, especially the zero and sign-extension stuff is super useful in many cases (e.g. pmovzx). SSE is not just about floating point.

Agreed, but are you saying that SSE2 doesn't include integer operations or that the operations are pretty useless due to carry/sign issues?

**Weasel** · 06 June 2018, 05:54 PM

Originally posted by caligula View Post

Agreed, but are you saying that SSE2 doesn't include integer operations or that the operations are pretty useless due to carry/sign issues?

SSE4 is not a replacement, it's an extension. You get new useful instructions that you can use, and it's not even about integer operations directly, but shuffling stuff into places (which is needed for the "normal" integer operations after, not SSE4).

Of course, with all this said, I don't think compilers can make particularly good use of this, seeing as they tend to suck at automatic SIMD and vectorization... and since the "default" flag is -O2 that's even less (vectorization tends to increase code size a lot, so GCC only enables a few of it at -O2, you'd need -O3 or enable flags manually)

**caligula** · 06 June 2018, 05:59 PM

Originally posted by Weasel View Post

SSE4 is not a replacement, it's an extension. You get new useful instructions that you can use, and it's not even about integer operations directly, but shuffling stuff into places (which is needed for the "normal" integer operations after, not SSE4).

I was just wondering this claim: "Actually they are way more significant than SSE2."

SSE2 (especially the AMD flavor) already provides lots of instructions, lots of new register space. It's not obvious to me that the update to say SSE3/SSSE3 is a bigger improvement in any sense. It's definitely an improvement, but does any data prove that they're actually a way more significant update.

**Weasel** · 06 June 2018, 06:12 PM

I've no idea what you are talking about "new register space" when they all use the exact same registers (xmm0-7 for 32-bit, xmm0-15 for x64). Most people's reason/obsession over SSE2 compared to SSE1 is not even about SIMD. It's that they now have "double scalar fp math" with SSE instead of having to use the x87 FPU, because it's fashionable to hate on the x87 fpu which to me is nonsense but ok.

Obviously you can still use 32-bit floating point math (single-precision) with SSE1 instructions, the double-precision ones are just extra available if you need it.

**carewolf** · 07 June 2018, 06:40 PM

Originally posted by Weasel View Post

Horizontal ops are lame because they are micro-coded and don't offer any performance benefit, just smaller code (compared to doing it "manually"). I think he's referring to *integer* SIMD operations, especially the zero and sign-extension stuff is super useful in many cases (e.g. pmovzx). SSE is not just about floating point.

I was referring to shufp, byte shuffling, which is an essential and extremely versatile operation for integer vector operations, mullop, 32bit integer multiplication, again a central operation missing from earlier SSE versions, but essential to autovectorizing any C code with arbitrary ints being multiplied, and the sign-extend conversions which are also very important for auto-vectorizing C code with mixed signed integer types.

Announcement

Fedora 29 Proposal "i686 Is For x86-64" Would Allow More Optimizations, Require SSE2

Comment

Comment

Comment

Comment

Comment