Announcement

Collapse
No announcement yet.

GCC 9 Compiler Tuning Benchmarks At Various Optimization Levels, Vectorize Options

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by F.Ultra View Post

    If that happens (and of course it will happen in a few cases) than it does not mean that "the code was bloated", all it means is that there is a bug in the GCC optimizer for that particular code because the resulting code was _less_ optimized in that case.
    This thread is giving me so many facepalms I'm just gonna quote Gentoo's wiki (from the people who actually care about overall system optimization):
    • -O3: the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage. Compiling with -O3 is not a guaranteed way to improve performance, and in fact, in many cases, can slow down a system due to larger binaries and increased memory usage. -O3 is also known to break several packages. Using -O3 is not recommended. However, it also enables -ftree-vectorize so that loops in the code get vectorized and will use AVX YMM registers.

    Comment


    • #22
      Originally posted by birdie View Post

      This thread is giving me so many facepalms I'm just gonna quote Gentoo's wiki (from the people who actually care about overall system optimization):
      • -O3: the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage. Compiling with -O3 is not a guaranteed way to improve performance, and in fact, in many cases, can slow down a system due to larger binaries and increased memory usage. -O3 is also known to break several packages. Using -O3 is not recommended. However, it also enables -ftree-vectorize so that loops in the code get vectorized and will use AVX YMM registers.
      If 03 breaks something it's a bug in the program or (unlikely) in GCC. If a package breaks because of O3 it's not ready for production.

      This quote might have been right 10 years ago or for embedded systems with limited cache... On a modern machine, even low-end, I've yet to see any benchmark proving this remains valid. Every evidence I've seen (including these Phoronix benchmarks) show same or better performance with O3.
      Last edited by wagaf; 12 January 2019, 03:35 PM.

      Comment


      • #23
        Originally posted by wagaf View Post

        If 03 breaks something it's a bug in the program or (unlikely) in GCC. If a package breaks because of O3 it's not ready for production.

        This quote might have been right 10 years ago or for embedded system with limited cache... On a modern machine, even low-end, I've yet to see any benchmark proving this remains valid. Every evidence I've seen (including this Phoronix benchmarks) show same or better performance with O3.
        Who the hell are you? Have you ever visited GCC's bugzilla? There is literally a hundred bug reports opened against GCC where it either compiles sub-optimal code at -O3 or crashes. Why do I even bother replying to your idiotic messages? How many bug reports have you filed against GCC? Have you ever compiled the kernel, the Qt library, or Firefox, or LibreOffice?

        I have filed a lot of bug reports against GCC where it either produces broken code, or ICEs (I'm pretty sure you don't even know what that means) or hangs. Why on Earth did you decide that GCC is some sort of ideal compiler (it's far from it) and -O3 is the best optimization level?

        Oh, God. In the past when people didn't know something they at least remained silent. Nowadays, you can opine pretty much about everything even if your opinion is based on nothing and it's factually incorrect. I'm done with this "discussion".

        Comment


        • #24
          Originally posted by birdie View Post
          The results show that you cannot blindly throw the same GCC compilation options at all your applications and you have to choose optimizations on a per-app basis which is still a huge nuisance for Gentoo lovers. And then we have -flto and PGO to rub salt into the wound.
          The results look like Gentoo users can just set -O3 --march=native. Gentoo users are the only ones who can blindly apply that, and some tasks get factor 2 faster, while none get slower.

          For those who haven’t been running Gentoo for the past 15 years: Packages that are not compatible with O3 typically filter out O3 from your CFLAGS.

          Though my typical setting was only -O2 --march=native. That simply worked.

          Besides: This is a really useful test. Thank you!

          Comment


          • #25
            Originally posted by ArneBab View Post
            Though my typical setting was only -O2 --march=native. That simply worked.
            though the main reason was that I had read in the wiki that O3 might cause breakage. This test shows that I might have been better off going for O3 right away.

            Comment


            • #26
              Originally posted by birdie View Post

              Who the hell are you? Have you ever visited GCC's bugzilla? There is literally a hundred bug reports opened against GCC where it either compiles sub-optimal code at -O3 or crashes. Why do I even bother replying to your idiotic messages? How many bug reports have you filed against GCC? Have you ever compiled the kernel, the Qt library, or Firefox, or LibreOffice?

              I have filed a lot of bug reports against GCC where it either produces broken code, or ICEs (I'm pretty sure you don't even know what that means) or hangs. Why on Earth did you decide that GCC is some sort of ideal compiler (it's far from it) and -O3 is the best optimization level?

              Oh, God. In the past when people didn't know something they at least remained silent. Nowadays, you can opine pretty much about everything even if your opinion is based on nothing and it's factually incorrect. I'm done with this "discussion".
              If you're gonna attack people personally and call their messages "idiotic" and "factually incorrect", at least provide factual evidence to support your claims.

              I never said GCC was perfect. I've encountered bugs in GCC and Clang myself (never filled bugs for them tough).
              However factual evidence I've seen (in programs i've worked on and various benchamarks) shows -O3 IS the best optimization level.

              edit: you mentioned the Linux kernel, it's true that O3 is not appropriate to build the kernel in my experience.
              Last edited by wagaf; 12 January 2019, 04:14 PM.

              Comment


              • #27
                Originally posted by birdie View Post
                This thread is giving me so many facepalms I'm just gonna quote Gentoo's wiki (from the people who actually care about overall system optimization):
                • -O3: the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage. Compiling with -O3 is not a guaranteed way to improve performance, and in fact, in many cases, can slow down a system due to larger binaries and increased memory usage. -O3 is also known to break several packages. Using -O3 is not recommended. However, it also enables -ftree-vectorize so that loops in the code get vectorized and will use AVX YMM registers.
                That's been the standard line for decades.

                The reason for benchmarks like those in the article is to provide up-to-date information about the relative merits of the different options. I don't know about you, but I'd rather base my decisions on real-world data than anecdotal evidence and lore.

                Now, you've been asked for evidence to support your claims. Instead of insults and documentation of unknown age and relevance, let's have some current, real data to pose against what Michael has provided.

                I think it's time for you to put up or shut up.


                BTW, regarding the question of code bloat, it would be interesting to compare binary sizes, as well. That's a suggestion for Michael / OpenBenchmarking.

                Comment


                • #28
                  Originally posted by birdie View Post
                  Oh, God. In the past when people didn't know something they at least remained silent. Nowadays, you can opine pretty much about everything even if your opinion is based on nothing and it's factually incorrect.
                  Too bad you didn't consider that before you opened your own mouth.

                  Originally posted by birdie View Post
                  I'm done with this "discussion".
                  Again, too bad you didn't have that attitude when the thread started, and we could have all been spared your vitriol. You have anger issues. And you obviously hate Linux and open-source. Why don't you do yourself and everyone else a favour and go to a forum where you won't get so angry at every little thing. It will really help your blood pressure.

                  Comment


                  • #29
                    Birdie on -Os:

                    As for x86 desktop CPUs with a lot of cache the code generated with -Os is almost always significantly slower than the one compiled with -O2. Back in Pentium 3 days and earlier I used -Os when RAM was limited and expensive. Nowadays, this option makes sense only for embedded/memory constrained devices.
                    Birdie on -O3, quoting some decade old wiki:

                    Compiling with -O3 is not a guaranteed way to improve performance, and in fact, in many cases, can slow down a system due to larger binaries and increased memory usage.
                    Well, then…

                    Now before I continue, I would like to stress that I am qualified to join this discussion per Birdies rules of entry, having submitted bugs to GCC and also compiled the kernel, the Qt library, and Firefox. Even Chromium. I feel so entitled now!

                    To share my experience, I develop scientific software for quite a long time and a living, and I build both my own software as well as anything on AUR with -O3 and various -march flags (currently for Skylake, some older Xeon and Epyc). That includes Qt, Firefox and even chromium from the list above. I will continue to do so and wish everybody who hates progress in compiler optimizations a happy life with their slower binaries.

                    Comment


                    • #30
                      I'll see if I can get some code I did a while ago that shows O3 breaking a simple CLI application, making it not accept user input.

                      If I can find it I'll gladly fill a bug report for them.

                      Comment

                      Working...
                      X