Announcement

Collapse
No announcement yet.

Ubuntu 15.10 + GCC 5.2: -O3, March=Native, FLTO Tests

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ubuntu 15.10 + GCC 5.2: -O3, March=Native, FLTO Tests

    Phoronix: Ubuntu 15.10 + GCC 5.2: -O3, March=Native, FLTO Tests

    A Phoronix Premium subscriber requested some fresh GCC compiler optimization tests, so here's some current results using GCC 5.2 on Ubuntu 15.10 64-bit...

    http://www.phoronix.com/scan.php?pag...15.10-GCC-Opts

  • #2
    A Phoronix Premium subscriber requested some fresh GCC compiler optimization tests, so here's some current results...
    Well, the results are certainly NOT HERE.

    Comment


    • #3
      bug77, there is a link to the results in the article. Those results, coupled with the major performance regressions down Michael has found earlier, suggest that people who need performance should stick with Ubuntu 15.04 w/ gcc 4.9 rather than upgrading to Ubuntu 15.10 w/ gcc 5.2.

      Comment


      • #4
        Some observations:
        1) -march=native often makes things ... worse. I can admit the very same thing happens on AMD CPUs as well.
        2) Its kinda strange LTO makes performance worse in some cases. Can someone explain how it could happen at all? If I remember it only supposed to discard unused code. How could it make benchmarks result 20% worse?
        Last edited by SystemCrasher; 10-20-2015, 12:04 PM.

        Comment


        • #5
          Originally posted by SystemCrasher View Post
          Some observations:
          1) -march=native often makes things ... worse. I can admit the very same thing happens on AMD CPUs as well.
          2) Its kinda strange LTO makes performance worse in some cases. Can someone explain how it could happen at all? If I remember it only supposed to discard unused code. How could it make benchmarks result 20% worse?


          Um did you look at this? http://openbenchmarking.org/result/1...HA-GCCCOMPIL63

          To me the optimizations improve the results.

          Comment


          • #6
            Originally posted by caligula View Post
            Um did you look at this? http://openbenchmarking.org/result/1...HA-GCCCOMPIL63
            To me the optimizations improve the results.
            I did. And if you failed to notice, arch=native made 8 tests to run worse than plain -O3. And lto version ... missing half of data without any good explanation. What happened? And across few results, 2 tests were absolutely worst of all. This makes me puzzled since I do not really understand how just adding lto can make results worse.

            As obvious example I did benchmark of LZ4 compression algo, on AMD FX CPU. Fastest was (stock) -O3 option from lib's author, who knows how to do it right. Attempts to use arch=native made things noticeably worse. So its not unique feature of this test set. So, before claiming there was improvement it is better idea to actually measure interesting workload to see if it really a case. As you can see, it can easily turn to regression.

            Comment


            • #7
              Originally posted by SystemCrasher View Post
              I did. And if you failed to notice, arch=native made 8 tests to run worse than plain -O3. And lto version ... missing half of data without any good explanation. What happened? And across few results, 2 tests were absolutely worst of all. This makes me puzzled since I do not really understand how just adding lto can make results worse.

              As obvious example I did benchmark of LZ4 compression algo, on AMD FX CPU. Fastest was (stock) -O3 option from lib's author, who knows how to do it right. Attempts to use arch=native made things noticeably worse. So its not unique feature of this test set. So, before claiming there was improvement it is better idea to actually measure interesting workload to see if it really a case. As you can see, it can easily turn to regression.
              I had to try them out. I seem to get same results with GCC 5.2. For example the C-ray test compiles a 500 LOC file. I preprocessed this into a single 2200 line C file that doesn't #include anything, just compiles on its own and links with math lib and pthreads. Really odd..
              Code:
              # gcc test.c -o test -O3 -march=native -lm -lpthread -flto
              # ./test -t 8 -s 800x600 -r 8 -i sphfract -o output.ppm
              
              Three runs:
              Rendering took: 8 seconds (8877 milliseconds)
              Rendering took: 8 seconds (8936 milliseconds)
              Rendering took: 8 seconds (8587 milliseconds)
              
              # gcc test.c -o test -O3 -march=native -lm -lpthread
              # ./test -t 8 -s 800x600 -r 8 -i sphfract -o output.ppm
              
              Three runs:
              Rendering took: 7 seconds (7589 milliseconds)
              Rendering took: 7 seconds (7802 milliseconds)
              Rendering took: 7 seconds (7799 milliseconds)

              Comment


              • #8
                what kind of idiot uses -O3? -march=native should be safe (and provide a sizeable performance increase) along with -O2. don't unroll loops, don't omit frame pointers. that leads to unstable systems.

                Comment


                • #9
                  Originally posted by jason.oliveira View Post
                  what kind of idiot uses -O3? -march=native should be safe (and provide a sizeable performance increase) along with -O2. don't unroll loops, don't omit frame pointers. that leads to unstable systems.
                  Just out of curiosity, have you even tested your claims? For example C-Ray performs 34% faster with -O3 vs -O2 on my system. -march=native doesn't seem to have any effect on C-Ray performance in my tests. Not sure which target GCC 5.2 is using by default on Core i7 Ivy Bridge. I could also test with Haswell and Skylake.

                  Comment


                  • #10
                    Originally posted by caligula View Post

                    Just out of curiosity, have you even tested your claims? For example C-Ray performs 34% faster with -O3 vs -O2 on my system. -march=native doesn't seem to have any effect on C-Ray performance in my tests. Not sure which target GCC 5.2 is using by default on Core i7 Ivy Bridge. I could also test with Haswell and Skylake.
                    C-Ray will perform faster, but it won't be nearly as accurate. back in 2004, I was running some pretty stupid insane CFLAGS. things were definitely faster, but at the cost of any sembalance of system stability. I eventually disabled -O3 in favor of -O2 or Os. one should look at the options that -O3 enables, and ask yourself whether it's worth the costs. If build with -O3, and you start seeing weird glitches, make another build with -O2.

                    Comment

                    Working...
                    X