Announcement

Collapse
No announcement yet.

Link-Time Optimizations With GCC 4.8

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by curaga View Post
    That depends on your toolchain - IIRC non-fat lto requires gold instead of the usual GNU ld.
    But you need to link using GCC or gold anyway to get LTO? The fat object files just makes it possible to link without using LTO. So non-fat objects also helps to check that you really are getting LTO and not just the old code through the fallback.

    Comment


    • #12
      Originally posted by carewolf View Post
      No, then you get -O0 optimizations. LTO means link-time optimizations, which means the linker does the optimizations, which again means the linker needs the optimization flags, but the compiler does not.

      So
      CXXFLAGS = -flto
      LDFLAGS = -O3 -march=native -flto -fwhole-program
      Ah, yes, that makes much better sense. I just noticed that unless I passed the optimization options to the linker flags I got poor optimization (likely -O0).

      Anyway, as I said earlier I think this is what is the problem with the regressions in Michael's tests. I doubt he has passed the optimization options to the LDFLAGS in the tests where these regressions occur. LTO often doesn't yield any 'worthwhile' gains in my benchmarks but also hasn't caused any worse performance for me. The overall benefit I've noticed is that the binaries pretty much always end up quite a bit smaller (likely due to dead/duplicate code removal, more efficient code reordering etc).

      Yes that pretty much sums it up, good pointer.

      Comment


      • #13
        Originally posted by carewolf View Post
        But you need to link using GCC or gold anyway to get LTO? The fat object files just makes it possible to link without using LTO. So non-fat objects also helps to check that you really are getting LTO and not just the old code through the fallback.
        I hit that with my toolchain - I couldn't use non-fat LTO, but I could use fat LTO. I definitely got the benefits (10% smaller binaries).

        Quote from the gcc manual:
        -ffat-lto-objects
        Fat LTO objects are object files that contain both the intermediate language and the object code. This makes them usable for both LTO linking and normal linking. This option is effective only when compiling with -flto and is ignored at link time.

        -fno-fat-lto-objects improves compilation time over plain LTO, but requires the complete toolchain to be aware of LTO. It requires a linker with linker plugin support for basic functionality. Additionally, nm, ar and ranlib need to support linker plugins to allow a full-featured build environment (capable of building static libraries etc).
        (emphasis mine)

        Comment


        • #14
          right, and this speeds up the compilation, the "time to compile" benchmark are completely messed up

          Comment


          • #15
            The dhrystone benchmark is crap.

            That benchmark is a derivate of the original 1988 dry.c which was composed by two separate .c files.
            Those two files were kept separate to avoid explicitly the compiler to inline function.

            example, assume to write a tool to benchnark the integer math, so we slipt it in mul.c and div.c with this functions:

            Code:
            int mul(int a, int b)
            {
                return a * b;
            }
            Code:
            int div(int a, int b)
            {
                return a / b;
            }
            and from our main call:

            Code:
            int test(int x)
            {
                for(int i = 1; i < x; i++)
                    div(mul(i, 100), 25);
            }
            inlining those two functions will generate something like:

            Code:
            int test(int x)
            {
                for(int i = 1; i < x; i++)
                    (i * 100) / 25;
            }
            which the compiler optimize as

            Code:
            int test(int x)
            {
                for(int i = 1; i < x; i++)
                    i * 4;
            }
            With a huge performance gain.
            While LTO is good in real use, it can fake many benchmarks, so I'll use it only on real world scenarios.

            Comment

            Working...
            X