Announcement

Collapse
No announcement yet.

GCC 6.1 Compiler Optimization Level Benchmarks: -O0 To -Ofast + FLTO

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by atomsymbol

    -Ofast is only a very small improvement compared to VDPAU.

    With R9 390 and Mesa, "mplayer -vo=vdpau -vc ffh264vdpau,ffmpeg12vdpau,ffwmv3vdpau,ffvc1vdpau," has CPU utilization about 5%, while "mplayer -vo=gl" has CPU utilization 25-35%. VDPAU is also more power efficient than CPU decoding.
    You can use vo=gl with vdpau as well, use hwdec instead of vo for vdpau.

    Comment


    • #12
      How come there's no LTO for GraphicsMagic?
      And ImageMagick's LTO looks like a bad regression...

      Comment


      • #13
        Originally posted by float View Post

        No reason not to, that is, for well made programs.
        Yes there is. Ofast enables illegal optimizations. You should not use it unless you are certain the program doesn't rely on correct FP handling. Enable it on a Javascript engine, and it stops working.

        Comment


        • #14
          Originally posted by carewolf View Post
          Yes there is. Ofast enables illegal optimizations.
          Are some optimisations legal and illegal now? First time I hear such a wild claim.

          Originally posted by carewolf View Post
          You should not use it unless you are certain the program doesn't rely on correct FP handling.
          Define correct FP handling. As far as I am concerned Ofast does not enable any optimisations about FP that do not conform to the C standard.

          Originally posted by carewolf View Post
          Enable it on a Javascript engine, and it stops working.
          It shouldn't unless it is not well-made.

          Comment


          • #15
            Originally posted by float View Post
            Are some optimisations legal and illegal now? First time I hear such a wild claim.
            For example, if you dereference a pointer and then check whether it was null, it is legal to optimize away the check (because dereferencing null leads to undefined behavior).
            On the other hand, if you first check for non-null pointer and only then dereference it, a conforming compiler may not optimize away this check, unless it can establish some other way that the pointer is non-null.

            Originally posted by float View Post
            Define correct FP handling. As far as I am concerned Ofast does not enable any optimisations about FP that do not conform to the C standard.
            -Ofast implies -ffast-math which can lead to unexpected loss in precision, rounding going wrong, etc.

            Comment


            • #16
              Originally posted by atomsymbol
              If -fstrict-aliasing is enabled and the C compiler prints a warning message and the message is ignored by the user, then the generated code may be invalid.
              Omitting -fno-strict-aliasing results in miscompiled LLVM+Mesa when -flto is used.
              I am not aware of -fstrict-aliasing being against the requirements set by the C standard. This probably means that LLVM+Mesa invoke undefined behaviour and should be considered as bug, or that gcc has a bug.

              Originally posted by chithanh View Post
              For example, if you dereference a pointer and then check whether it was null, it is legal to optimize away the check (because dereferencing null leads to undefined behavior).
              On the other hand, if you first check for non-null pointer and only then dereference it, a conforming compiler may not optimize away this check, unless it can establish some other way that the pointer is non-null.


              -Ofast implies -ffast-math which can lead to unexpected loss in precision, rounding going wrong, etc.
              So, by saying legal do you define optimisations that are allowed by the standard? In that case, what makes you think that the optimisations enforced by -ffast-math are not allowed by the C standard?

              Originally posted by atomsymbol
              In my opinion, a conforming compiler can do any transformation that preserves the semantics of the original code, including in some cases applying the optimization you just mentioned.
              This is correct to my knowledge.

              Comment


              • #17
                Originally posted by float View Post
                So, by saying legal do you define optimisations that are allowed by the standard?
                Yes. In the example mentioned above, the compiler optimizes away a conditional branch because it has determined at compile time that in conforming code, the comparison is always false.
                Originally posted by float View Post
                In that case, what makes you think that the optimisations enforced by -ffast-math are not allowed by the C standard?
                The C standard references IEEE 754 for floating point math. -ffast-math changes behaviour in a way that is not allowed by IEEE 754.

                Comment


                • #18
                  Originally posted by chithanh View Post
                  Yes. In the example mentioned above, the compiler optimizes away a conditional branch because it has determined at compile time that in conforming code, the comparison is always false.
                  The C standard references IEEE 754 for floating point math. -ffast-math changes behaviour in a way that is not allowed by IEEE 754.
                  https://gcc.gnu.org/wiki/FloatingPointMath
                  IEEE 754 is not required by the C standard and thus such optimisations are totally legal. See Annex F of the C11 standard, "An implementation that
                  defines __STDC_IEC_559__ shall conform to the specifications in this annex. 356)" and "356) Implementations that do not define __STDC_IEC_559__ are not required to conform to these specifications.". To my knowledge GCC does not define anything like that (also see https://gcc.gnu.org/c99status.html, "GCC does not define __STDC_IEC_559__ or implement the associated standard pragmas").

                  Comment


                  • #19
                    Looks like the short answer wasn't sufficient then.
                    Even if one assumes that "well made programs" don't rely on IEEE 754 behaviour in the absence of __STDC_IEC_559__, you can have it straight from the horse's mouth:
                    Originally posted by man gcc
                    -Ofast
                    Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math and the Fortran-specific -fno-protect-parens and -fstack-arrays.
                    One example which actually violates C99 is that -funsafe-math-optimizations (implied by -ffast-math) contracts expressions to FMA instructions in cases where this is not allowed by the standard.

                    Comment


                    • #20
                      Originally posted by Fry-kun View Post
                      How come there's no LTO for GraphicsMagic?
                      And ImageMagick's LTO looks like a bad regression...
                      I think it is most likely problem with missing linker plugin in Michael's testers. GraphicsMagic builds and works just fine with LTO for me. WIthout linker plugin the LTO is crippled in several ways:

                      1) every object file contains both LTO intermediate code and final assembly (to make non-plugin-aware binutils grok them). This doubles compile time as useless binaries are produced

                      2) when static libraries are used, the LTO intermediate code is silently ignored nullyfing any LTO benefits

                      3) resolution info is not available to the compiler. This forces compiler to expect that every single symbol can be touched by non-LTO world and thus serve as an optimization boundary. This prevents a lot of useful code transformations.

                      Other possible explanation may be that Michael uses parallel build but LTO linking is run serially. In that case in addition to -j=n passed to Makefile you want to also use -flto=n. Imagemagick test is compile time benchmark and the regression probably comes from one of these reasons. In general LTO buidls are slower, but not by big margin (i.e. it is similar difference as between -O1 and -O2). You need more memory and disk space.

                      Comment

                      Working...
                      X