Announcement

Collapse
No announcement yet.

LLVM Clang 3.8 Compiler Optimization Benchmarks With -Ofast

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • LLVM Clang 3.8 Compiler Optimization Benchmarks With -Ofast

    Phoronix: LLVM Clang 3.8 Compiler Optimization Benchmarks With -Ofast

    A few days ago I posted a number of LLVM Clang optimization level benchmarks using the latest code for the upcoming Clang 3.8 release. Those tests went from -O0 to -O3 -march=native, but many Phoronix readers wanted -Ofast so here are those results too...

    http://www.phoronix.com/scan.php?pag...lang-3.8-Ofast

  • #2
    I don't agree that -Ofast is a valid optimization target. It's more of a play/testing target. Never unless you're dead sure about your code.

    It's like saying "how fast can we make things if we break compliance", especially with floating point. fast-math can introduce some serious accuracy issues and a whole other bunch of problems. Don't know about the other optimizations.. but I assume some of them can be unsafe, otherwise they'd probably be enabled already by -O3.

    The same obviously goes for GCC.

    Comment


    • #3
      I had to look up -Oz as it's the first time I've heard of it:
      -O3 : throw everything and hope it sticks
      -O2 : optimized build, but should not explode in code size nor consume all resources while compiling
      -O1 : optimized debug binaries, don't change the execution order but remove dead code and stuff
      -O0 : don't touch it
      -Os : optimize, but don't run passes that could blow up code. Try to be a bit more drastic when removing code. When in doubt, prefer small, not fast code.
      -Oz : only perform optimizations that reduce code size. Don't even try to run things that could potentially increase code size.

      From Renato Golin Linaro

      It would be great if you could run all the tests with -march=native next time

      Comment


      • #4
        Thank You Michael!

        Comment


        • #5
          Originally posted by milkylainen View Post
          I don't agree that -Ofast is a valid optimization target. It's more of a play/testing target. Never unless you're dead sure about your code.

          It's like saying "how fast can we make things if we break compliance", especially with floating point. fast-math can introduce some serious accuracy issues and a whole other bunch of problems. Don't know about the other optimizations.. but I assume some of them can be unsafe, otherwise they'd probably be enabled already by -O3.

          The same obviously goes for GCC.
          It's great for games and demos (with decent QA), but not for scientific apps.

          Comment


          • #6
            Originally posted by caligula View Post

            It's great for games and demos (with decent QA), but not for scientific apps.
            Yeah. I guess I've been bitten by too many stupid compiler bugs / optimizations through the years to avoid everything that is not guaranteed to work, whatever that would mean.
            I absolutely hate finding that the tool that is supposed to do the work is broken. Usually costs way more hours than expected because you always assume that the code in question was broken in the first place.

            Comment


            • #7
              Interesting, how much better the performance will be, if generate and use profile?
              -fprofile-generate -fprofile-use http://clang.llvm.org/docs/UsersManu...d-optimization
              Last edited by unquaid; 08 February 2016, 05:10 PM.

              Comment


              • #8
                Originally posted by caligula View Post

                It's great for games and demos (with decent QA), but not for scientific apps.
                It is perfectly fine for most scientific apps also. Most of the fast-math optimization doesn't reduce accuracy, they just violate the floating point standard. The problem with them is that code that handles things like NaN, Inf, -Zero, etc, doesn't work when using optimizations that assumes they don't exist. But those numbers are not all that useful in most calculations. Similarly doing 1/x, storing that value and multiplying it with a list of other values is not reducing accuracy, but it does provide very slightly different results than a standard-compliant full division of each number.

                Comment


                • #9
                  It's not hard to write new code, even new scientific code, that plays perfectly well with -Ofast/-ffast-math, and benefits from it. You just have to make sure you avoid things like, for example, subtracting two large nearly equal numbers and expecting their difference to retain a certain number of significant bits.

                  Yes, lots of legacy code might not play very nice with it. However, not playing nice with -ffast-math is a symptom of a deeper problem. If code doesn't give the same answers to within the machine noise when compiled with and without -ffast-math, then it shouldn't be expected to give the same results on, say, x86 and ARM, or a CPU and GPU. So the math in the code should probably be refactored anyway.

                  I've generally seen that code that gives radically different answers when run with reduced precision in a particular intermediate value can also give a radically different answer when run with INCREASED precision. So the original "expected" answer, when rigidly following the IEEE floating point standard, had little to do with the mathematically correct answer in the first place.

                  The question to answer, rather than just declaring that -Ofast is never appropriate for a given field of study, is whether a particular codebase is too big and complicated to be made safe when using these optimizations. The benefits are real, and the accumulated time, power consumption, &c. saved by using them may well offset the developer effort required.

                  Comment


                  • #10
                    Originally posted by unquaid View Post
                    Interesting, how much better the performance will be, if generate and use profile?
                    -fprofile-generate -fprofile-use http://clang.llvm.org/docs/UsersManu...d-optimization
                    I think about 15% is the rule of thumb, but that depends on the codebase obviously. More importantly, it is heavily reliant on having good training sessions to run the profiler on. Which makes it a very tough option to test, and very time intensive as well.

                    Comment

                    Working...
                    X