Announcement

Collapse
No announcement yet.

Fedora 29 Proposal "i686 Is For x86-64" Would Allow More Optimizations, Require SSE2

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by carewolf View Post
    Btw, why not set the new limit to SSSE3/SSE4 though? That would still cover all CPUs made the last 10 years
    Not quite. While it's true that since the 2nd Gen Core 2 Duo (45nm Wolfdale/Penryn) all intel "big" cpus from a hw point of view can do sse41, pentiums/celerons had that disabled up to and including Clarkdale (Sandy Bridge introduced AVX, so from that point on celerons/pentiums have sse41 enabled and avx disabled), and Clarkdale is a 2010 cpu. Also, the small atom cores can only do it since Silvermont (late 2013), and while Bonnell/Saltwell definitely aren't stellar cpus there's no way you can drop support for them already. Although at least ssse3 is supported indeed with all intel cpus since 2006 or so (first gen Core 2, all atoms).
    But it's worse for AMD. Later K8 models can do SSE3, but not SSSE3 (and no SSE41 support of course neither). K10 (that includes Phenom II) can't do it neither. You need Bulldozer for it (or Bobcat). So that would be not even 7 years (not to mention most people preferred K8/K10-based chips over BDs...).

    Comment


    • #12
      Originally posted by caligula View Post
      The improvements in >SSE2 aren't as significant in general.
      Actually they are way more significant than SSE2. SSSE3 has the all important shuffle instruction and SSE4.1 as integer multiply, it more than 4 doubles the places a compiler can autovectorize in most codebases I have worked on.

      Edit: Though that might be because I do more integer math than FP, it is not nearly as important for FP performance.
      Last edited by carewolf; 06-05-2018, 05:29 AM.

      Comment


      • #13
        Originally posted by carewolf View Post

        Actually they are way more significant than SSE2. SSSE3 has the all important shuffle instruction and SSE4.1 as integer multiply, it more than 4 doubles the places a compiler can autovectorize in most codebases I have worked on.

        Edit: Though that might be because I do more integer math than FP, it is not nearly as important for FP performance.
        You do that in 32-bit mode on a 64-bit processor? And you care about performance? Odd.

        It would be interesting if x32-abi were better supported. This is x86-64 with 32-bit pointers. Should be quite useful on small systems (with perhaps 2G of RAM or less -- like the netbook on which I am typing).

        Comment


        • #14
          Originally posted by Hugh View Post

          You do that in 32-bit mode on a 64-bit processor? And you care about performance? Odd.
          What are you blathering about?

          I specifically wondered why not set the new limit to SSE4 and also raise it for x64.

          Comment


          • #15
            Originally posted by Hugh View Post

            You do that in 32-bit mode on a 64-bit processor? And you care about performance? Odd.

            It would be interesting if x32-abi were better supported. This is x86-64 with 32-bit pointers. Should be quite useful on small systems (with perhaps 2G of RAM or less -- like the netbook on which I am typing).
            x32 has to be a pure x32 system, so no proprietary software like Skype or drivers, no flash player (ok this is not important anymore but used to be even 5+ years ago), no Wine to play some silly Warcraft III. Biggest benefit should be lower CPU load from SSL/TLS.
            And so, the niche for x32 was to be server VMs. But as we talk of dropping even i686, this should explain the lack of interest.

            In other news Mint is fairly conservative in not dropping stuff, there are Mint 19 Mate edition and Mint 19 Xfce out ("BETA" but more a kind of single RC version) with 32bit i686.

            Comment


            • #16
              Originally posted by grok View Post

              x32 has to be a pure x32 system, so no proprietary software like Skype or drivers, no flash player (ok this is not important anymore but used to be even 5+ years ago), no Wine to play some silly Warcraft III. Biggest benefit should be lower CPU load from SSL/TLS.
              And so, the niche for x32 was to be server VMs. But as we talk of dropping even i686, this should explain the lack of interest.
              Apparently x32 is purely a userland thing and an x86-64 kernel supports it (if the feature is enabled). But you'd need versions of all the libraries too (kind of like you need x86-32 libraries on your x86-64 machine if you wish to run x86-32 userland binaries). So you could have some programs compiled for x32 and some for x86-64 and some for x86-32.

              I don't know of any distro that provides x32 libraries.

              Most UNIX utilities used to fit in 64k of program space and 64k of data space (the PDP-11's limitations, the almost-original host of UNIX). In the i386 days, I considered trying to compile some of those utilities in i286 mode. But I never got around to it.

              It is interesting to me that almost all SPARC and Power userland code is 32-bit on 64-bit hardware. It seems to be something to do with bad code density costs on those RISC architectures.
              Last edited by Hugh; 06-05-2018, 08:42 PM.

              Comment


              • #17
                I forgot to be more clear in saying that you need a pure x32 userland (at least) to realize the RAM savings, although I didn't realize x32 was done with an x86-64 kernel. (or I forgot, having never run anything x32 anyway)

                If you can run a desktop with x86-64 kernel and i686 userland (or a server if only that is available) I think it's still good for some old computers that have a 64bit CPU but only 1GB RAM. I've seen it, two slots DDR1 (DIMM or So-DIMM). Can still be fast enough on a single core to be good for something, in fact the last one I use like that (some Athlon 64 at 2 GHz with some HDD and low profile graphics card) was embarassingly fast at booting and running a Mate desktop. Single-thread performance and HDD speed weren't even really out of place almost 15 years later.

                Now.. there are a few modern computers where this would still be useful : mini laptop with a quad core 14nm CPU, fixed 2GB RAM and 32GB eMMC. 64bit iso is easier to install since it supports the UEFI (I don't know if some specific 32bit linux iso supports UEFI. The distro I used says it doesn't in the 32bit version)
                It works fine with a 64bit OS but if it was mine I'd like to run 64bit kernel and i686 userland.

                Comment


                • #18
                  Originally posted by carewolf View Post

                  Actually they are way more significant than SSE2. SSSE3 has the all important shuffle instruction and SSE4.1 as integer multiply, it more than 4 doubles the places a compiler can autovectorize in most codebases I have worked on.

                  Edit: Though that might be because I do more integer math than FP, it is not nearly as important for FP performance.
                  Ok, I'm not qualified enough to argue about this. AFAIK SSE2 has both integer/float operations with 8-64 bit numbers and can use 8 x 128 bits of storage for packed stuff. The SSE2 set also contains instructions for bypassing cache which is sometimes useful. If you compare this to MMX systems, SSE2 can provide 4-16x performance increase in integer math in a tight loop and maybe more due to other savings.

                  In SSE3-SSE4 the main addition is horizontal ops, some special ops like dot product acceleration. These are useful in some domains, but not in general. I can assume some programs like lame/flac/ffmpeg can use them, but many can't. The problem with SSE 4 is that SSE 4.2 is only available in 2-3 latest AMD generations. The full power of SSE2 is also only available on x86 AMD and x64 Intel (twice the register count). The AVX instructions also show mixed results in tests, apparently due to some throttling and lots of heat generation.

                  Comment


                  • #19
                    Originally posted by caligula View Post
                    Ok, I'm not qualified enough to argue about this. AFAIK SSE2 has both integer/float operations with 8-64 bit numbers and can use 8 x 128 bits of storage for packed stuff. The SSE2 set also contains instructions for bypassing cache which is sometimes useful. If you compare this to MMX systems, SSE2 can provide 4-16x performance increase in integer math in a tight loop and maybe more due to other savings.

                    In SSE3-SSE4 the main addition is horizontal ops, some special ops like dot product acceleration. These are useful in some domains, but not in general. I can assume some programs like lame/flac/ffmpeg can use them, but many can't. The problem with SSE 4 is that SSE 4.2 is only available in 2-3 latest AMD generations. The full power of SSE2 is also only available on x86 AMD and x64 Intel (twice the register count). The AVX instructions also show mixed results in tests, apparently due to some throttling and lots of heat generation.
                    Horizontal ops are lame because they are micro-coded and don't offer any performance benefit, just smaller code (compared to doing it "manually"). I think he's referring to *integer* SIMD operations, especially the zero and sign-extension stuff is super useful in many cases (e.g. pmovzx). SSE is not just about floating point.

                    Comment


                    • #20
                      Originally posted by Hugh View Post
                      I don't know of any distro that provides x32 libraries.
                      Gentoo has an x32 port. Also, with Gentoo you have control over what your libraries are compiled for (x86, x32, amd64). So you can run a pure x32 system if you want, and need only the amd64 toolchain for building your kernel.

                      Of course, the usual problems apply. Software that assumes __x86_64__ == __LP64__ or depends on hand-written x86/amd64 assembly will fail, and in general you won't be able to run binaries downloaded from somewhere.

                      Comment

                      Working...
                      X