Announcement

Collapse
No announcement yet.

Fedora 29 Proposal "i686 Is For x86-64" Would Allow More Optimizations, Require SSE2

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by Weasel View Post
    Horizontal ops are lame because they are micro-coded and don't offer any performance benefit, just smaller code (compared to doing it "manually"). I think he's referring to *integer* SIMD operations, especially the zero and sign-extension stuff is super useful in many cases (e.g. pmovzx). SSE is not just about floating point.
    Agreed, but are you saying that SSE2 doesn't include integer operations or that the operations are pretty useless due to carry/sign issues?

    Comment


    • #22
      Originally posted by caligula View Post
      Agreed, but are you saying that SSE2 doesn't include integer operations or that the operations are pretty useless due to carry/sign issues?
      SSE4 is not a replacement, it's an extension. You get new useful instructions that you can use, and it's not even about integer operations directly, but shuffling stuff into places (which is needed for the "normal" integer operations after, not SSE4).

      Of course, with all this said, I don't think compilers can make particularly good use of this, seeing as they tend to suck at automatic SIMD and vectorization... and since the "default" flag is -O2 that's even less (vectorization tends to increase code size a lot, so GCC only enables a few of it at -O2, you'd need -O3 or enable flags manually)
      Last edited by Weasel; 06-06-2018, 05:56 PM.

      Comment


      • #23
        Originally posted by Weasel View Post
        SSE4 is not a replacement, it's an extension. You get new useful instructions that you can use, and it's not even about integer operations directly, but shuffling stuff into places (which is needed for the "normal" integer operations after, not SSE4).
        I was just wondering this claim: "Actually they are way more significant than SSE2."

        SSE2 (especially the AMD flavor) already provides lots of instructions, lots of new register space. It's not obvious to me that the update to say SSE3/SSSE3 is a bigger improvement in any sense. It's definitely an improvement, but does any data prove that they're actually a way more significant update.

        Comment


        • #24
          I've no idea what you are talking about "new register space" when they all use the exact same registers (xmm0-7 for 32-bit, xmm0-15 for x64). Most people's reason/obsession over SSE2 compared to SSE1 is not even about SIMD. It's that they now have "double scalar fp math" with SSE instead of having to use the x87 FPU, because it's fashionable to hate on the x87 fpu which to me is nonsense but ok.

          Obviously you can still use 32-bit floating point math (single-precision) with SSE1 instructions, the double-precision ones are just extra available if you need it.

          Comment


          • #25
            Originally posted by Weasel View Post
            Horizontal ops are lame because they are micro-coded and don't offer any performance benefit, just smaller code (compared to doing it "manually"). I think he's referring to *integer* SIMD operations, especially the zero and sign-extension stuff is super useful in many cases (e.g. pmovzx). SSE is not just about floating point.
            I was referring to shufp, byte shuffling, which is an essential and extremely versatile operation for integer vector operations, mullop, 32bit integer multiplication, again a central operation missing from earlier SSE versions, but essential to autovectorizing any C code with arbitrary ints being multiplied, and the sign-extend conversions which are also very important for auto-vectorizing C code with mixed signed integer types.

            Comment

            Working...
            X