Announcement

Collapse
No announcement yet.

Squeezing More Juice Out Of Gentoo With Graphite, LTO Optimizations

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    I plan to upgrade to Ryzen Threadripper sooner or later, it will be fun to test this repo then.
    ## VGA ##
    AMD: X1950XTX, HD3870, HD5870
    Intel: GMA45, HD3000 (Core i5 2500K)

    Comment


    • #22
      Originally posted by Zan Lynx View Post

      That is my experience too. It seems to me that if your CPU has plenty of cache, the O3 optimizations are always better.
      I just don't see how. It's not so much that -03 causes build failures, it's more so that it causes undefined behavior. On gentoo most ebuilds that filter out -03 do so because its known to that maintainer to cause runtime bugs. Sometimes it is filtered because it causes build failures but that's much rarer.

      Comment


      • #23
        Originally posted by duby229 View Post

        I just don't see how. It's not so much that -03 causes build failures, it's more so that it causes undefined behavior. On gentoo most ebuilds that filter out -03 do so because its known to that maintainer to cause runtime bugs. Sometimes it is filtered because it causes build failures but that's much rarer.
        It shouldn't. If it does, it's either a compiler or an application bug. More often than not it's the application's fault.
        Last edited by GrayShade; 11 September 2017, 02:32 PM.

        Comment


        • #24
          Originally posted by GrayShade View Post

          It shouldn't. If it does, it's either a compiler or an application bug. More often than not it's the application's fault.
          "the application is at fault" doesn't help when we want a computer that works and compiling @world with -O3 produces issues.

          But hey, I think all Gentoo users have gone through their -O3 phase.

          Comment


          • #25
            Originally posted by Holograph View Post

            "the application is at fault" doesn't help when we want a computer that works and compiling @world with -O3 produces issues.

            But hey, I think all Gentoo users have gone through their -O3 phase.
            If the application is buggy, it might as well bug out under -O2 one day. I've seen it happening myself with strict aliasing violations. So it's better to file a bug and hope for the best.

            Comment


            • #26
              Originally posted by sdack View Post
              No, it hasn't. Just of the top of my head does ffmpeg fail to build with an ICE.
              Come on. before that bugfix friggin tar would fail to compile.
              ffmpeg is full of assembly and other dark magic and if graphite fails on one or two such packages, who cares.

              Before that fix that came with 7.2.0, there were quite a few such failures...

              Comment


              • #27
                Originally posted by GrayShade View Post

                It shouldn't. If it does, it's either a compiler or an application bug. More often than not it's the application's fault.
                Oh, we do file bug reports. Gentoo devs and users file upstream bug reports all the time. No software is bug free. Especially not software written in C or C++. Good bug reporting is really important for gentoo.

                Comment


                • #28
                  Originally posted by Brane215 View Post
                  Come on. before that bugfix friggin tar would fail to compile.
                  ffmpeg is full of assembly and other dark magic and if graphite fails on one or two such packages, who cares.

                  Before that fix that came with 7.2.0, there were quite a few such failures...
                  No. Do check the link to the code I've posted above. Compile it with -O3 and -fgraphite-identity or -floop-nest-optimize or any other graphite optimization. It's pretty simply code (it only won't let me post it here) and it ICEs each time and does so since gcc 5. Nobody is fixing it and the code is rotting. You can no longer trust these optimizations to function properly.

                  Did you then test the code it produces? I did ran some tests and where it actually produced a different assembly code was the difference only marginal. It was literally just a change in the indexing order with no measurable performance gain. I wish it wasn't so and that I could tell you something else, but this is how it presents itself at the moment. If you have any better suggestions than just "Come on" then I'm more than happy to hear it.

                  Comment


                  • #29
                    Originally posted by sdack View Post
                    No. Do check the link to the code I've posted above. Compile it with -O3 and -fgraphite-identity or -floop-nest-optimize or any other graphite optimization. It's pretty simply code (it only won't let me post it here) and it ICEs each time and does so since gcc 5. Nobody is fixing it and the code is rotting. You can no longer trust these optimizations to function properly.

                    Did you then test the code it produces? I did ran some tests and where it actually produced a different assembly code was the difference only marginal. It was literally just a change in the indexing order with no measurable performance gain. I wish it wasn't so and that I could tell you something else, but this is how it presents itself at the moment. If you have any better suggestions than just "Come on" then I'm more than happy to hear it.
                    True I can't really say I've found any real world gains in measuring. I still use graphite sometimes just to see if I can find an obvious improvement but the few things I've tested haven't really shown anything.

                    I do see a few graphite-related ICEs but most packages compile. There are still more LTO-related failures than graphite in my experience with 7.2.0

                    I've been doing graphite / lto on gentoo for years now, but I do all my filtering in portage bashrc, rather than patching builds, so it's pretty easy. The failures are always similar. I could easily automate the blacklisting if it were bothersome enough.
                    Last edited by ormaaj; 11 September 2017, 04:57 PM.

                    Comment


                    • #30
                      Hi everyone,

                      I'm Shane, the creator of this repository. I was super surprised to find my work here today! I'd like to clear up a few misconceptions in this thread about what the goals of this project actually are, and what O3 optimizations actually mean.

                      My goals of this project are as follows:

                      * Identify packages which do not play nice with LTO and fix them if possible
                      * LTO, O3 and Graphite optimizations will help reveal the use of Undefined Behaviour in C and C++ programs (see https://en.wikipedia.org/wiki/Undefined_behavior )
                      * Perhaps, in the best case, even get a performance improvement on the whole system. Benchmarks are sorely needed for this.
                      * Identify cases where -O3 and Graphite optimizations actually have a performance regression versus O2. These would make for good GCC bug reports. Especially with Graphite!
                      * In all cases, let the compiler decide which transformations to apply. Never override the compiler's cost function for applying a certain transformation (no -funroll-all-loops -- ever).

                      You can see that performance is actually not a direct goal of this--although long term it absolutely will be. Short term is to find breakages in packages. Briefly, no code should *ever* fail to compile under O3 optimization. That's usually not a bug with the compiler, and it almost always is a bug with the code being compiled. There's a long list of Undefined Behaviour in C and C++ that can trigger these artifacts. Tossing Graphite and LTO in there only exacerbate the effects here. The fact that I've managed to get everything built with so few exceptions is actually a good sign! Anyway, the point is, if the code doesn't work with O3, that means there is a problem somewhere and the problem is not with O3!

                      Now, about performance. First, if O3 generates worse code than O2, **that is a bug in the compiler**. O3 should *never* produce worse results than O2, and if you do find cases where it does, please try to isolate a code fragment in question and send a bug report to the GCC devs. The optimizing cost function of GCC is probably in need of adjustment. As to why why people use O2? Simple: there's a lot of shoddy code out there, and O2 lets it somewhat work. O2 plays nicer with Undefined Behaviour because it performs fewer transformations on the code in question. This does not mean we should be encouraging people to be developing with O2 in mind! O3 should be at least as good as O2 in all cases.

                      About Graphite: yes--we *want* to be triggering those ICEs and making bug reports out of these. These are helpful in improving GCC. No, I have not had a case where the graphite optimizer produced bad code for me yet. I have had a few cases where it does trigger an ICE however.

                      About LTO: Whether you use O2 or O3, or Graphite or not, LTO should be a straight up benefit compared to non LTOed code. MSVC has defaulted to an LTCG (LTO) build for quite some time now, and GCC + binutils do support this. We need to get people using it. The benefits of LTO are twofold:

                      * Smaller binary sizes: good for embedded devices (I even run LTO builds on my router using LEDE!)
                      * Better optimization potential: inlining can occur across far more boundaries, including linking against static libraries. More information is available to the optimizer as well.

                      The central philosophy behind this repo is to not override the compiler's judgement in *any* capacity. The compiler should be given complete freedom in how it wants to perform code generation. You won't ever find any funroll-all-loops here! In the ideal world, everything works correctly and there are performance gains to be had. In the real world, we can use this to find breakage, because I guarantee you there's a lot of UB lurking about that has gone unnoticed because of O2.

                      Compiling with high O levels isn't the only way to find UB. Fuzzing is also a great way to find problems in code. The more eyes looking at the problem, the better. In the end, I hope to get some good bug reports out of this project and yes, improve performance in the long run. I hope this clears things up! And contributions are highly welcomed. I have detailed some tasks at the end of the README in my GitHub.

                      Comment

                      Working...
                      X