Announcement

Collapse
No announcement yet.

-O3 Compiler Optimization Level Still Deemed Too Unsafe For The Linux Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Ok, I'm now getting attacked by both sides for being too defensive of Linus, and for not sucking up to him enough.

    There's way too many dumb people in this thread, I'm officially out. Have fun.

    Comment


    • #32
      Originally posted by phoronix View Post
      (GCC) possibly generating bad code with the "-O3" compiler optimization level
      In my experience as a developer, the most common problem with -O3 is not bad compilers, but bad code. It's not too hard to accidentally write C code that depends on undefined behavior. Such code can work as expected with -O2 but fail under -O3. It can even work fine for years with -O3, until some minor change to seemingly unrelated code tips the optimizer into behaving differently and it breaks. Or updates to the compiler can change optimization behavior and break programs which previously worked but are technically invalid.

      I've seen -O3 bugs with LLVM as well as with GCC. It's easy to say "well just don't write bad code then" but historically this strategy has not been shown to be effective with C programming.

      Comment


      • #33
        Originally posted by moilami View Post
        Who are you?
        I am your next idol and future obsession. I am the conscience you wish you had. *lol*
        Last edited by sdack; 06 June 2021, 08:59 PM.

        Comment


        • #34
          Originally posted by sdack View Post
          Forget -O3. Build the kernel with Clang and enable LTO. Simply pass LLVM=1 and LLVM_IAS=1 as arguments to make and you should be able to switch on LTO optimisation (works so far for x86 and 64-bit Arm, with a few exceptions, see arch/Kconfig for details).

          Code:
          $ make LLVM=1 LLVM_IAS=1 gconfig
          ​

          Compiling the kernel, normal or as a cross-compile, works in the same way:

          Code:
          make LLVM=1 LLVM_IAS=1 all
          or
          Code:
          make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 LLVM_IAS=1 all

          It is so simple, even Michael should be able to do it.
          Only downside is if you are an Nvidia driver user. The driver cannot be built LTO and as a result the kernel cannot be built LTO

          Comment


          • #35
            Originally posted by Naib View Post
            Only downside is if you are an Nvidia driver user. The driver cannot be built LTO and as a result the kernel cannot be built LTO
            How did you notice it?

            I am asking, because I use the proprietary driver (465.31) and it compiled as an out-of-tree module with LTO just fine. You should get to see the following at the end when building the kernel modules for Nvidia:
            Code:
            ...
            LTO [M] /usr/src/nvidia/nvidia-drm.lto.o
            LTO [M] /usr/src/nvidia/nvidia-ib-peermem-stub.lto.o
            LTO [M] /usr/src/nvidia/nvidia-modeset.lto.o
            LTO [M] /usr/src/nvidia/nvidia-peermem.lto.o
            LTO [M] /usr/src/nvidia/nvidia-uvm.lto.o
            LTO [M] /usr/src/nvidia/nvidia.lto.o
            ...
            As a test have I run vkcube and nvenc and these seem to work just fine as always.

            What can happen is that when you build the kernel modules for the Nvidia driver and it checks the compiler version against the running kernel then it can bail with a warning. You then need to set an environment variable export IGNORE_CC_MISMATCH=1 for the installer to ignore the conflict, or build the modules after you have booted with the new kernel.

            But do let me know if I have missed something.
            Last edited by sdack; 07 June 2021, 09:51 AM.

            Comment


            • #36
              Originally posted by sdack View Post
              How did you notice it?

              I am asking, because I use the proprietary driver and it compiled as an out-of-tree module with LTO just fine. You should get to see the following at the end when building the kernel modules for Nvidia:
              Code:
              ...
              LTO [M] /usr/src/nvidia/nvidia-drm.lto.o
              LTO [M] /usr/src/nvidia/nvidia-ib-peermem-stub.lto.o
              LTO [M] /usr/src/nvidia/nvidia-modeset.lto.o
              LTO [M] /usr/src/nvidia/nvidia-peermem.lto.o
              LTO [M] /usr/src/nvidia/nvidia-uvm.lto.o
              LTO [M] /usr/src/nvidia/nvidia.lto.o
              ...
              As a test have I run vkcube and nvenc and these seem to work just fine as always.

              What can happen is that when you build the kernel modules for the Nvidia driver and it checks the compiler version against the running kernel then it can bail with a warning. You then need to set an environment variable export IGNORE_CC_MISMATCH=1 for the installer to ignore the conflict, or build the modules after you have booted with the new kernel.

              But do let me know if I have missed something.
              Well I started the process of going full CLANG last night and once I had the toolchain built with itself I then moved on. Kernel built fine but when attempting to build the nvidia driver I got the tell tail signs that it was going to be a package that does not support LTO. It wasn't a warning, it was outright errors.
              The fact that you are saying it can gives me hope...

              Thinking about this a bit more ... it might not be the actual kernel driver that has failed but more the userland libraries and thus I have the likes of MESA and co that needs to be rebuilt

              Comment


              • #37
                I have just built xanmod 5.12.9 cacule against clang 12 under pop os 20.04
                with fuillto
                march=skylake
                O3 is enabled by default on xanmod kernels

                ....wow thats blazing fast......downside as always with nvidia...is nvidia.

                Super responsive that system.

                Later I will try it on my gaming rig
                Last edited by CochainComplex; 08 June 2021, 09:35 AM.

                Comment


                • #38
                  Originally posted by sdack View Post
                  Where is this train of thought going? That it was not Trump either? That it was not <insert-your-idolised-leadership-here> but all the workers, voters or soldiers doing it? ... Torvalds has got the responsibility over the Linux project and he is the one bickering about a lack of care. The end. There is no need to derail this and to turn this into a brown-nosing competition for fear of your idols being called out for the dumb shit they talk. Everybody talks shit and Torvalds is no exception.
                  It's not Trumps fault if some rouge soldier down the line does something evil, once it becomes known however it's his damn responsibility to make sure that the matter is investigated, that the person(s) responsible are duly punished and that measures are taken (if possible) to make sure that it won't happen again.

                  Likewise it's not the fault of neither Torvalds nor Greg regardless of how little you like them, that some subsystem maintainers accepted a few less than stellar patches. Once Greg discovered that it happened though he took immediate action so this is a very stupid case to bring up if you want to throw shit at Linus or Greg.

                  Comment


                  • #39
                    Originally posted by F.Ultra View Post
                    It's not Trumps fault ...
                    This is not about a bug and nobody is talking about whose fault it was. Torvalds is not accusing a specific person, but he generalises, rants and talks shit about an entire project. A point was then made here on the forum that he should look at his own project before he talks shit about others, and it is a well-made point. Stop making excuses for Torvalds, stop saying it was not his fault, do not defend him. His behaviour is not acceptable. So do not enable it and do not support it.
                    Last edited by sdack; 08 June 2021, 03:50 PM.

                    Comment


                    • #40
                      So much heated whataboutism is this thread. Please stop.

                      Originally posted by foobaz View Post

                      In my experience as a developer, the most common problem with -O3 is not bad compilers, but bad code. It's not too hard to accidentally write C code that depends on undefined behavior. Such code can work as expected with -O2 but fail under -O3. It can even work fine for years with -O3, until some minor change to seemingly unrelated code tips the optimizer into behaving differently and it breaks. Or updates to the compiler can change optimization behavior and break programs which previously worked but are technically invalid.

                      I've seen -O3 bugs with LLVM as well as with GCC. It's easy to say "well just don't write bad code then" but historically this strategy has not been shown to be effective with C programming.
                      This is so true.

                      Programmers have a model of how C works. Generally, they are wrong. This is mostly harmless until an optimizer uses its discretion to change things using the as-if rule and exploiting Undefined Behaviour.

                      A classic example from buggy kernel code, from my dodgy memory:

                      Code:
                      junk = *ptr;
                      if (ptr == NULL)
                        something();
                      After optimization, there is no call to something!
                      Dereferencing a null pointer is UB. The compiler is allowed to infer that ptr is not null. So the IF can be eliminated!

                      This would likely surprise any C coder who didn't already know it.

                      I think that GCC stopped exploiting this perfectly legal inference in kernel code because it surprised programmers too much.

                      Furthermore, In some kernel code, 0 is a valid address. So things are even more weird. The compiler doesn't know this. It is kind of paradoxical from the standpoint of the C language.

                      Comment

                      Working...
                      X