Announcement

Collapse
No announcement yet.

Linux 5.15's New "-Werror" Behavior Is Causing A Lot Of Pain

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Maybe a question for Linus himself is what he prefer:
    1) code that gives no warnings but it's totaly broken with the hardware it supposed to work with (there are already such things in the kernel with no plans to remove them)
    2) code that gives warnings but work just fine with the hardware

    While Linus idea is not bad from one point of view there is a bigger problem. If a compiler change is making the code that until now wasn't giving a warning start to give warnings how Linus expect this problem to be treated? Because there are 2 options here:
    1) force the people working at that compiler revert the change that is causing the warnings, after all since now it wasn't the case
    2) force people to fix the code that was so far compiling fine and having to deal with people plain refusing to do it and ending up in a situation where you either drop that stuff from kernel or plain fix it yourself

    Coding something in Linux kernel is like moving sand. A driver coded for kernel 3.18, that is happy running with 3.18, will plain not work in 5.15 and require a lot of patches to actually work properly.
    When a kernel change is making a driver no longer work I don't consider that is the job of the one that coded the driver to fix it, but it's the job of the ones that submited the kernel patches and accepted upstream the kernel patches that are causing this problem because it's actually them who are breaking the compatibility.,..
    I'm not talking here about code not compiling I'm talking here about code that happy compiles fine but it's totaly crashing/misbehaving due to kernel changes done by someone else.

    In windows case I can take a driver from 2012 and happy install it and it will just work, can't really say the same about Linux case. This is a way bigger issue than the fact that some stuff is giving warnings while compiling...
    Maybe next time Linus will decide well to remove everything that is showing warnings in logs despite the fact that the hardware is 100% healthy(won't really do that because you will have a hard time finding a pc that will actually work...)
    Last edited by thedukesd; 08 September 2021, 02:57 AM.

    Comment


    • #22
      Originally posted by thedukesd View Post
      Maybe a question for Linus himself is what he prefer:
      1) code that gives no warnings but it's totaly broken with the hardware it supposed to work with (there are already such things in the kernel with no plans to remove them)"
      Broken code is just broken code, like buggy. What does that have to do with compiler warnings?
      Either you fix the code to not be broken or you don't because not enough people care to put in the work to get the code working. It's not related to compiler warnings.
      2) code that gives warnings but work just fine with the hardware
      Either you deem the compiler warning to be valid or you don't. In the first case you fix the code to address the compiler warning. In the latter case you either suppress the warning for that particular case or you disable it globally for all code.

      Note that "code that works just fine" can be independent of the amount of warnings given. You can have "code that works just fine" that has a ton of compiler warnings. That does not mean that it's OK. The warnings should either be adressed or explicitly suppressed.
      Last edited by tomas; 08 September 2021, 02:58 AM.

      Comment


      • #23
        Originally posted by thedukesd View Post
        2) code that gives warnings but work just fine with the hardware
        The horrible part here is a lot of this turned out to be a works for me. There were a lot of cases with getting different PCIe cards to work with arm 64 bit systems instead of x86 that it turns out that the compiler warning that were fine on x86 system to ignore was the case for those drivers not to work on the arm 64 bit systems. This is not a new problem with Linux.

        Its very important to work out if those warnings are truly invalid or not. Yes if they are invalid there are flags you can wrap around that section of code that it no longer generates a warning/error.

        Originally posted by thedukesd View Post
        While Linus idea is not bad from one point of view there is a bigger problem. If a compiler change is making the code that until now wasn't giving a warning start to give warnings how Linus expect this problem to be treated? Because there are 2 options here:
        1) force the people working at that compiler revert the change that is causing the warnings, after all since now it wasn't the case
        2) force people to fix the code that was so far compiling fine and having to deal with people plain refusing to do it and ending up in a situation where you either drop that stuff from kernel or plain fix it yourself
        Yes 1 case the compiler is wrong that why you would be patching the compiler.

        Second part here is a lot more complex. You have a works for me problem. If the compiler is right as the code has a issue this may not effect the platform you as a developer is building for. So some party on arm64, power, risc-v .... is on the receiving end of your code fault. Should they be on the receiving end of your code fault.

        Reality every time a warning appears it need to be validated if it a true code issue or if it a compiler error. Remember Linus has been fixing a lot of this stuff up until now.



        Comment


        • #24
          Originally posted by tomas View Post

          Broken code is just broken code, like buggy. What does that have to do with compiler warnings?
          Either you fix the code to not be broken or you don't because not enough people care to put in the work to get the code working. It's not related to compiler warnings.
          It's about wrong focus on the real problem. You look at compiler warnings and not at the fact that you have broken code that happens to compile fine. At this very moment all the drivers for wifi that exist in kernel are broken, every single one is broken, yet they happy compile fine and most don't give any warnings.
          If you are unable to mantain compatibility with old stuff ofc you need a sick amount of people working to keep the code somehow working.

          Originally posted by tomas View Post
          Either you deem the compiler warning to be valid or you don't. In the first case you fix the code to address the compiler warning. In the latter case you either suppress the warning for that particular case or you disable it globally for all code.

          Note that "code that works just fine" can be independent of the amount of warnings given. You can have "code that works just fine" that has a ton of compiler warnings. That does not mean that it's OK. The warnings should either be adressed or explicitly suppressed.
          You are assuming here that the compiler didn't had changes that are now causing warnings, same warnings that were not to be seen 2 versions ago.
          You are also assuming that the compiler is right with the warnings...

          If the code is running fine but gives warnings while compiling I would trust the code and not the compiler.
          If the code is not really running fine but gives no warnings while compiling I wouldn't trust the code. In this particular last case atm there are Intel WiFi cards that are crashing in a particular situation. this is happening because there is no real check if .11w is actually supported or not, there is no blacklist on the opensource side for devices that have incomplete/broken blobs or plain have no real support for ,11w.
          (I know you will say well blobs =/= opensource, well even ath9k that supposed to be mature by now has a broken ANI implementation and even if you disable ANI things are still bad...)

          For me a bloody warning while compiling is nothing when comparing it with completly broken functionality.

          Linus actually has a problem with the kernel type... Monolithic kernel by design will become so big that it will be impossible to mantain if you want to support a lot of hardware. Monolithic kernel work just fine for a small amount of supported hardware.
          With small amount of supported hardware you can test if changes break something. With lot of hardware you can't, you depend on other people, people that might not have the best intentions, people that might just partialy test it, people that might only want to say in their resume that they have code upstream in the kernel and so on...

          To make some hardware drivers actually compile after kernel changes the mantainer has decided to cut parts of the code. Yep plain cut parts of the code because that was assumed to be the offending code. The kernel changes weren't making the part of the code that was removed useless!!! Result was well code was compiling fine, and was working apparently fine until some particular conditions were happening that were wll causing things to plain stall in the hardware and you end up with a situation with no kernel messages about the problem but the hardware is stuck.
          Last edited by thedukesd; 08 September 2021, 03:28 AM.

          Comment


          • #25
            Originally posted by thedukesd View Post
            For me a bloody warning while compiling is nothing when comparing it with completly broken functionality.
            This is works for me. Lot of the warnings gcc spits out turns out to be serous when you in fact build in different architectures. So lot of cases a warning is completely broken functionality for someone else on some other arch.


            Originally posted by thedukesd View Post
            It's about wrong focus on the real problem. You look at compiler warnings and not at the fact that you have broken code that happens to compile fine. At this very moment all the drivers for wifi that exist in kernel are broken, every single one is broken, yet they happy compile fine and most don't give any warnings.
            Fun point there is a core part shared between all the wifi drivers that has currently been giving a warning. It would be great if someone goes and fixes that. Lot of the wifi crashes traces straight to that warning.

            Warnings are bad. If funny how one of the area you pointed to is one of the area where people really do need to fix up code that is throwing warnings.

            Comment


            • #26
              Originally posted by lyamc View Post
              This should be a net positive. A warning that is a nuisance either shouldn't be a warning (compiler should be fixed), or, it is a warning and the code should be fixed.
              Agreed. It definitely is a pain having a kernel build fail 'coz of a warning but as Linus said "time to clean up *YOUR* house too".

              Comment


              • #27
                Originally posted by thedukesd View Post

                It's about wrong focus on the real problem. You look at compiler warnings and not at the fact that you have broken code that happens to compile fine. At this very moment all the drivers for wifi that exist in kernel are broken, every single one is broken, yet they happy compile fine and most don't give any warnings.
                If you are unable to mantain compatibility with old stuff ofc you need a sick amount of people working to keep the code somehow working.



                You are assuming here that the compiler didn't had changes that are now causing warnings, same warnings that were not to be seen 2 versions ago.
                You are also assuming that the compiler is right with the warnings...

                If the code is running fine but gives warnings while compiling I would trust the code and not the compiler.
                If the code is not really running fine but gives no warnings while compiling I wouldn't trust the code. In this particular last case atm there are Intel WiFi cards that are crashing in a particular situation. this is happening because there is no real check if .11w is actually supported or not, there is no blacklist on the opensource side for devices that have incomplete/broken blobs or plain have no real support for ,11w.
                (I know you will say well blobs =/= opensource, well even ath9k that supposed to be mature by now has a broken ANI implementation and even if you disable ANI things are still bad...)

                For me a bloody warning while compiling is nothing when comparing it with completly broken functionality.

                Linus actually has a problem with the kernel type... Monolithic kernel by design will become so big that it will be impossible to mantain if you want to support a lot of hardware. Monolithic kernel work just fine for a small amount of supported hardware.
                If you code is running fine but has warning, it's likely your code is fragile, contains bad practise and only happen to run by luck.

                The next time somebody else modifies your code, it's likely to break.

                Comment


                • #28
                  Originally posted by thedukesd View Post

                  It's about wrong focus on the real problem. You look at compiler warnings and not at the fact that you have broken code that happens to compile fine. At this very moment all the drivers for wifi that exist in kernel are broken, every single one is broken, yet they happy compile fine and most don't give any warnings.
                  If you are unable to mantain compatibility with old stuff ofc you need a sick amount of people working to keep the code somehow working.



                  You are assuming here that the compiler didn't had changes that are now causing warnings, same warnings that were not to be seen 2 versions ago.
                  You are also assuming that the compiler is right with the warnings...

                  If the code is running fine but gives warnings while compiling I would trust the code and not the compiler.
                  If the code is not really running fine but gives no warnings while compiling I wouldn't trust the code. In this particular last case atm there are Intel WiFi cards that are crashing in a particular situation. this is happening because there is no real check if .11w is actually supported or not, there is no blacklist on the opensource side for devices that have incomplete/broken blobs or plain have no real support for ,11w.
                  (I know you will say well blobs =/= opensource, well even ath9k that supposed to be mature by now has a broken ANI implementation and even if you disable ANI things are still bad...)

                  For me a bloody warning while compiling is nothing when comparing it with completly broken functionality.

                  Linus actually has a problem with the kernel type... Monolithic kernel by design will become so big that it will be impossible to mantain if you want to support a lot of hardware. Monolithic kernel work just fine for a small amount of supported hardware.
                  With small amount of supported hardware you can test if changes break something. With lot of hardware you can't, you depend on other people, people that might not have the best intentions, people that might just partialy test it, people that might only want to say in their resume that they have code upstream in the kernel and so on...
                  1.) Compilers don't care about functionality, there are other tools for that way better than bending the compiler into it.

                  2.) Compiler warnings are usually serious, an special set is -Wconversion that tend to be very dangerous specially when you cannot validate types at compile time(hardware interaction is a big one case btw)

                  something stupid asjust a tiny example)

                  unsigned long get_buffer(){ <-- Also note this may not always have the same size on all architecture and trigger all kinds of retard behavior, sure this days is kinda standard but Linux support all kind of weird hardware and architectures.
                  return my_hardware_buffer(); <-- this is an int but you cannot know at compile time if the hardware will send negatives as well
                  }

                  Can result in crashes at runtime, security issue, garbage data, kernel lock ups, etc.

                  You should use stuff like this only when you are 100% sure is impossible for the hardware to send the wrong type and document it in the code OR fix the freaking type which is Linus point here.

                  Comment


                  • #29
                    Originally posted by lyamc View Post
                    This should be a net positive. A warning that is a nuisance either shouldn't be a warning (compiler should be fixed), or, it is a warning and the code should be fixed.
                    In an ideal world, but in this one new Linux will be build with older compilers (and buggy warnings, false positives) and old Linux will be build with newer compilers (and new checks/bugs). Probably should limit this checks for a range of known, usable compiler versions and an opt-in for the brave

                    Comment


                    • #30
                      Originally posted by discordian View Post
                      In an ideal world, but in this one new Linux will be build with older compilers (and buggy warnings, false positives) and old Linux will be build with newer compilers (and new checks/bugs). Probably should limit this checks for a range of known, usable compiler versions and an opt-in for the brave
                      Little point.
                      https://gcc.gnu.org/onlinedocs/gcc/D...c-Pragmas.html
                      These were introduced in gcc 4.6 in April 12, 2013 to fix a bug report I had opened. Yes I was the one who proposed the push and pop part of that thing. And the
                      warning, error and ignored came out of debate on those bugs.

                      New Linux building a gcc 4.6 and newer should possible to build without any warnings/false positives. Yes those Diagnostic Pragmas are able to be selective alter based on gcc version or llvm version.

                      https://www.kernel.org/doc/html/late...s/changes.html
                      Please note the min compilers to build the current Linux kernel is 4.9 gcc and 10.0.1 Clang/LLVM.

                      There does come the question should the Linux kernel narrow it range of support compilers.



                      Comment

                      Working...
                      X