Announcement

Collapse
No announcement yet.

Link-Time Optimizations With GCC 4.8

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Link-Time Optimizations With GCC 4.8

    Phoronix: Link-Time Optimizations With GCC 4.8

    GCC 4.8 will feature a few improvements when it comes to LTO, a.k.a. Link-Time Optimization, but will this reflect in any greater performance for the resulting binaries?..

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    I saw all the linked results as well.
    Basically, a few percent improvement. About 4-5% over stock.

    Free performance is always good, but may be not at the cost of 3X time and 2.5X RAM use.

    Comment


    • #3
      Originally posted by mayankleoboy1 View Post
      I saw all the linked results as well.
      Basically, a few percent improvement. About 4-5% over stock.

      Free performance is always good, but may be not at the cost of 3X time and 2.5X RAM use.
      When it becomes a more consistent win, it will make sense to use it in release builds of binaries that are redistributed. I think it's definitely worth having packaging take 3x longer if it makes the resulting binary 5% faster and strips out lots of dead code too.

      Comment


      • #4
        I seriously doubt Michael is using LTO correctly.

        When you are using just a single command to compile, like gcc -march=native -O3 -flto -fwhole-program ... it works fine, but when you use a makefile with separate C(XX)FLAGS and LDFLAGS you need to pass the C(XX)FLAGS along to the LDFLAGS, else the optimization will suffer greatly. So you should do something like this:

        CXXFLAGS = -O3 -march=native -flto -fwhole-program
        LDFLAGS = $(CXXFLAGS) -Wall

        I've done many LTO comparisons and it's not always that there is any gain (alot of the benefits of LTO can be had by just defining functions as static when appropriate) but I've never come across such regressions as shown here in Michael's tests. Hence I'm thinking he is not passing the C(XX)FLAGS along to the linker through the LDFLAGS in the tests which uses a makefile with separate C(XX)FLAGS/LDFLAGS, which in turn means the C(XX)FLAG optimizations aren't being used when generating the final binary.

        Comment


        • #5
          Originally posted by XorEaxEax View Post
          it works fine, but when you use a makefile with separate C(XX)FLAGS and LDFLAGS you need to pass the C(XX)FLAGS along to the LDFLAGS, else the optimization will suffer greatly. So you should do something like this:

          CXXFLAGS = -O3 -march=native -flto -fwhole-program
          LDFLAGS = $(CXXFLAGS) -Wall
          Is this enought:
          CXXFLAGS = -O3 -march=native -flto -fwhole-program
          LDFLAGS = -flto -Wall

          Comment


          • #6
            Originally posted by LightBit View Post
            Is this enought:
            CXXFLAGS = -O3 -march=native -flto -fwhole-program
            LDFLAGS = -flto -Wall

            AFAIK you need to pass the optimization flags aswell, atleast I recall having to do so the last time I benchmarked LTO (which was on 4.7, not 4.8), so:

            CXXFLAGS = -O3 -march=native -flto -fwhole-program
            LDFLAGS = -O3 -march=native -flto -fwhole-program -Wall (... and whatever other linker options you have)

            or just reference the CXXFLAGS variable as I did above:
            LDFLAGS = $(CXXFLAGS) -Wall

            I believe this is necessary due to the ability of using LTO on object files written in different languages, but I may be wrong. I haven't really dived into LTO as I haven't gotten any major gains from it for my own code, particularly when compared to PGO which pretty much always yield gains, often significant.

            Comment


            • #7
              Originally posted by XorEaxEax View Post
              I believe this is necessary due to the ability of using LTO on object files written in different languages, but I may be wrong. I haven't really dived into LTO as I haven't gotten any major gains from it for my own code, particularly when compared to PGO which pretty much always yield gains, often significant.
              I've never heard of PGO until now, but would love to see some recent benchmarks. Most of the articles I saw were reporting up to ~10% gains.

              Also, from man gcc:
              Code:
              To use the link-time optimizer, -flto needs to be specified at compile time and during the final link.

              Comment


              • #8
                Originally posted by LightBit View Post
                Is this enought:
                CXXFLAGS = -O3 -march=native -flto -fwhole-program
                LDFLAGS = -flto -Wall

                No, then you get -O0 optimizations. LTO means link-time optimizations, which means the linker does the optimizations, which again means the linker needs the optimization flags, but the compiler does not.

                So
                CXXFLAGS = -flto
                LDFLAGS = -O3 -march=native -flto -fwhole-program

                Would work, but your example would not.

                Note you can also speed up the compilation even more by disabling fat object files, by default GCC produces object files that both contain the code for LTO linking and traditional object code, the later is not needed if you are going to use LTO anyway on the final link. Edit: Using -fno-fat-lto-objects as a compile time flag.
                Last edited by carewolf; 10 February 2013, 03:21 PM.

                Comment


                • #9
                  Originally posted by carewolf View Post
                  Note you can also speed up the compilation even more by disabling fat object files, by default GCC produces object files that both contain the code for LTO linking and traditional object code, the later is not needed if you are going to use LTO anyway on the final link. Edit: Using -fno-fat-lto-objects as a compile time flag.
                  That depends on your toolchain - IIRC non-fat lto requires gold instead of the usual GNU ld.

                  Comment


                  • #10
                    Additionally, the optimization flags used to compile individual files are not necessarily related to those used at link time. For instance,

                    gcc -c -O0 -flto foo.c
                    gcc -c -O0 -flto bar.c
                    gcc -o myprog -flto -O3 foo.o bar.o


                    This produces individual object files with unoptimized assembler code, but the resulting binary myprog is optimized at -O3. If, instead, the final binary is generated without -flto, then myprog is not optimized.

                    Comment

                    Working...
                    X