Announcement

Collapse
No announcement yet.

GCC 10 Link-Time Optimization Benchmarks On AMD Threadripper

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Michael View Post

    Ubuntu Mainline PPA most often aside from when I am bisecting or needing to patch my kernels.
    Thanks! Any noticeable improvements? Is it a must-have perf upgrade for the 3960X, or am I fine with 5.3?

    Comment


    • #12
      So basically, it can sway significantly one way or another, but most of the time the difference is insignificant. And how many lines of code were added to gcc for this "feat"?

      Comment


      • #13
        Mesa seems to benefit quite a bit from LTO.

        Add this to the meson command:

        Code:
        -Db_lto=true

        Comment


        • #14
          Originally posted by archsway View Post
          Mesa seems to benefit quite a bit from LTO.

          Add this to the meson command:

          Code:
          -Db_lto=true
          Yes, but mostly with reduced size. No (dramatic) speed increase.
          `-fwhole-program` isn't worth it, it seems.

          Comment


          • #15
            From the gcc docs:

            -fwhole-program
            Assume that the current compilation unit represents the whole program being compiled. All public functions and variables with the exception of main and those merged by attribute externally_visible become static functions and in effect are optimized more aggressively by interprocedural optimizers.

            This option should not be used in combination with -flto. Instead relying on a linker plugin should provide safer and more precise information.

            Comment


            • #16
              As far as I know, -fwhole-program works best when the entire program fits into a single source file/compilation unit. In theory, a benchmark like Himeno should benefit from it, but the results obtained here show otherwise. Odd.

              Instead of using -flto in conjunction with -fwhole-program, I would suggest replacing -fwhole-program with -flto-partition=none (possible values are none/one/1to1/balanced/max), which would disable WHOPR/partitioned LTO and switch to full LTO.

              I suppose one could also accelerate the benchmarks measuring compilation time by using -flto=n, which would parallelize the linking process when using WHOPR.

              Comment


              • #17
                Since mostly what I care about for absolute roaring speed is FFTW on 512-2048 FFT sizes... this doesn't look like it's going to help me much. It might at 2048 I guess... I need to run some benchmarks of my own.

                Thanks for the tests.

                Comment


                • #18
                  For compile time you need -flto=$CPUCOUNT (or -flto=jobserver and CC=-gcc)

                  Fat objects should also already be disabled. So the main difference is the compiling in parallel. It shouldn't be that much slower than normal building, it just uses a metric shit-ton more memory.

                  Comment


                  • #19
                    Originally posted by set135
                    This is what I have been using on Gentoo for many years, for all but a few packages:
                    CFLAGS=-march=native -O2 -pipe -fno-stack-protector -flto=4 -fuse-linker-plugin
                    CXXFLAGS=$CFLAGS
                    LDFLAGS=-Wl,-flto=4 $CFLAGS

                    My goal was primarily to reduce executable size, and just to see how it works, so it is interesting to see some benchmarks.
                    Why pass -flto to the linker? Just link with gcc/g++, and let it deal with the command line. Also -fuse-linker-plugin is redundant. But yes, that will improve binary size greatly, even if you would need -O3 to get the performance benefits of -flto.

                    Comment


                    • #20
                      Originally posted by archsway View Post
                      Mesa seems to benefit quite a bit from LTO.

                      Add this to the meson command:

                      Code:
                      -Db_lto=true
                      Yes that would be an interesting benchmark, there's also MESA support for PGO (profile guided optimization) which in my experience is typically a more impactful optmization. The variable is -Db_pgo= and the parameters are off/generate/use . Perhaps something for Michael to try out when he does a new PGO benchmark.

                      Comment

                      Working...
                      X