Announcement

Collapse
No announcement yet.

Fast Kernel Headers v2 Posted - Speeds Up Clang-Built Linux Kernel Build By ~88%

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by oleid View Post

    Why would it change runtime speed? This is about changing the include hierarchy. There is zero chance it will have a runtime impact.
    Because it also changed some functions to not be suggested for inlining. Depending on whether the compiler decided to follow the hint and whether that hint was clever or not, that could make runtime performance slightly better or slightly worse. I doubt the change would be huge tho.

    Comment


    • #32
      Originally posted by atomsymbol View Post
      Apart the fact that the modules are for the C++ language, in the first patches set Ingo Molnar reported:

      [...]
      As to other approaches attempted: I also tried enabling the kernel to use pre-compiled headers (with GCC), which didn't bring a performance improvement beyond a 1-2% for repeat builds. Plus during development most of the precompiled headers are cache-cold in any case and need to be generated by the compiler again and again.
      [...]


      https://lwn.net/ml/linux-kernel/[email protected]/

      So it seems that the increase of speed is due mostly to the header re-organization.

      Comment


      • #33
        Originally posted by CochainComplex View Post

        *cough* ..as a German I'm obliged to point out that there a plenty of high quality options available. Just to name the obvious ones: Porsche, Mercedes, BMW, VW ....and then there are the acquired brands running on german engines: Lamborghini, Bugatti...but with the later ones you have issues with the shipmenttime aswell.
        Ford "invented" the assembly line with high speed, mass produced vehicles and, because of that, was able to ship out vehicles 88% faster.

        Comment


        • #34
          Originally posted by sinepgib View Post

          Because it also changed some functions to not be suggested for inlining. Depending on whether the compiler decided to follow the hint and whether that hint was clever or not, that could make runtime performance slightly better or slightly worse. I doubt the change would be huge tho.
          Another thing that might change is auto-inlining. I don't know if or how often the Linux kernel uses static definitions in the headers, but if those definitions have been moved to C files, they now can only be auto-inlined if LTO is enabled.

          Comment


          • #35
            Originally posted by oleid View Post

            Why would it change runtime speed? This is about changing the include hierarchy. There is zero chance it will have a runtime impact.
            He is touching also the inlining/not inlining of function


            [...]
            - Uninlining: there's a number of unnecessary inline functions that also couple otherwise unrelated headers to each other. The fast-headers tree contains over 100 uninlining commits.
            [...]

            https://lwn.net/ml/linux-kernel/[email protected]/


            It is hard to say if and how much it will impact on the performance, but theoretically it is possible to have a regression (or an improvement !)

            Comment


            • #36
              Originally posted by kreijack View Post
              He is touching also the inlining/not inlining of function

              [...]
              - Uninlining: there's a number of unnecessary inline functions that also couple otherwise unrelated headers to each other. The fast-headers tree contains over 100 uninlining commits.
              [...]

              https://lwn.net/ml/linux-kernel/[email protected]/
              Yup, having too many include dependencies suggests insufficient use of abstractions. And a very basic abstraction is the functional call.

              It's nice to have some general guidelines around function inlining. In userspace code, I typically start with a policy that functions calling 2 or more non-inlinable functions should not themselves be inline (or preferably even defined in a header-based class definition). And in C++, you have to remember that every expression which can allocate heap -> a non-inlinable function call.

              Other techniques to reduce header file dependencies tend to increase reliance on the heap, which potentially comes at some runtime cost. That lets you hide the types used to implement a given programming interface, which means your public headers don't need to include the lower-level ones. Of course, C99's support for variable-length arrays means you could have an opaque type that callers allocate on the stack!

              Code:
              char storage[obj_size()];
              struct Obj *p_obj = obj_init(storage);
              
              // use obj ...
              
              obj_cleanup(p_obj);
              Obviously, if the definition of struct Obj is hidden, then you have 3 non-inline function calls, whereas making it public might've enabled having just 2 inline functions (obj_init() and obj_cleanup()). However, assuming it's a heavy-weight type, that's not much overhead. If it's actually a lighter-weight type, then calling those + obj_size() is still a heck of a lot cheaper than a pair of calls to malloc() + free()!

              Again, making the type opaque reduced header file dependencies, since the caller doesn't need to see the definitions of the types used inside of struct Obj. It also preserves flexibility for the implementation, enabling it to add/remove/re-arrange members in struct Obj without the caller having to recompile.

              You could even hide the storage + init in a macro, but hiding too much in macros creates opportunities for unintentional misuse.
              Last edited by coder; 09 January 2022, 01:55 PM.

              Comment


              • #37
                Originally posted by kreijack View Post

                Apart the fact that the modules are for the C++ language, in the first patches set Ingo Molnar reported:

                [...]
                As to other approaches attempted: I also tried enabling the kernel to use pre-compiled headers (with GCC), which didn't bring a performance improvement beyond a 1-2% for repeat builds. Plus during development most of the precompiled headers are cache-cold in any case and need to be generated by the compiler again and again.
                [...]


                https://lwn.net/ml/linux-kernel/[email protected]/

                So it seems that the increase of speed is due mostly to the header re-organization.
                I don't understand why you believe what you believe, because:

                - Modules/packages are a feature that can be added to most programming languages that do not already have modules/packages. It could be added to C as well.

                - Ingo's statement "Plus during development most of the precompiled headers are cache-cold in any case and need to be generated by the compiler again and again" is a testament to how inefficient/primitive the algorithms related to precompiled headers in the tested C/C++ compilers (gcc) are. A more appropriate name for gcc's "precompiled headers" would be "non-incremental preparsed headers". (Note: I developed, as an experiment only, an incremental compiler in the past.)

                Comment


                • #38
                  Originally posted by atomsymbol View Post
                  - Modules/packages are a feature that can be added to most programming languages that do not already have modules/packages. It could be added to C as well.
                  Being such a low-level/zero-overhead language, I doubt C will get modules.

                  Depending on how these hypothetical C modules are implemented, they could just push most of the work from compilation to the linking phase.

                  Originally posted by atomsymbol View Post
                  - Ingo's statement "Plus during development most of the precompiled headers are cache-cold in any case and need to be generated by the compiler again and again" is a testament to how inefficient/primitive the algorithms related to precompiled headers in the tested C/C++ compilers (gcc) are. A more appropriate name for gcc's "precompiled headers" would be "non-incremental preparsed headers". (Note: I developed, as an experiment only, an incremental compiler in the past.)
                  Carrying around more persistent state seems like it'd significantly increase the complexity of the compiler, as well as multiplying the opportunities for things to go wrong.

                  I'd gladly design my code around the limitations of a more primitive toolchain than try to debug a compiler that's collapsing under the weight of its own complexity.

                  Comment


                  • #39
                    Originally posted by coder View Post
                    That's a bit simplistic. I'm not sure there aren't a few things in C that needlessly make work harder for compilers than it needs to be, but I also wonder if the comment about Pascal isn't presuming the same degree of optimizations.
                    Of course it's simplistic, it's post in a forum

                    Originally posted by coder View Post
                    It's also worth considering that how you use a language has a lot to do with compile times. I once had a template-heavy C++ file that took a couple minutes to compile and doing so used a couple GB of memory. Once I eliminated some unnecessary template parameter type deduction, it took only a few seconds to compile and I think memory usage dropped accordingly. Although this particularly bad example involves C++, I can imagine things one might do in C that also create needless burden, such as having lots of overlong and inline functions.

                    Part of the software engineering discipline is understanding how to use programming language features in a scalable and maintainable way.
                    That's what I was hinting at. You need lightning fast feedback, you use a scripting language. You need features, you pick a language that offers them. Know your tool, bend it to your will. Instead, I keep hearing whining about "this is slow to compile" or "this language doesn't have that feature". Makes me sad.
                    Last edited by bug77; 09 January 2022, 06:57 PM.

                    Comment


                    • #40
                      Originally posted by atomsymbol View Post
                      I don't understand why you believe what you believe, because:
                      - Modules/packages are a feature that can be added to most programming languages that do not already have modules/packages. It could be added to C as well.
                      I believe what I believe, because the modules doesn't exist for C; and you confirmed that . I suppose that Ingo doesn't have any wish to develop a better C compiler, but he is trying (with very good results) to rearrange the headers to get the speed improvement that he want.

                      Originally posted by atomsymbol View Post
                      - Ingo's statement "Plus during development most of the precompiled headers are cache-cold in any case and need to be generated by the compiler again and again" is a testament to how inefficient/primitive the algorithms related to precompiled headers in the tested C/C++ compilers (gcc) are. A more appropriate name for gcc's "precompiled headers" would be "non-incremental preparsed headers". (Note: I developed, as an experiment only, an incremental compiler in the past.)
                      Ingo showed that the current state of art (the pre-compiled header) didn't give any advantage. If you know a compiler with a better support for the pre-compiled header/modules..., show the number of the progress and everybody (me first) will be happy.

                      E.g., if the fault is in the gcc, LVM will show better results (but I assume that Ingo already tested it).

                      Anyway reading a 2nd time the Ingo statement, I don't understand what he said. The linux source are about 8GB (source+git+ .o files), so even considering the memory usage of parallel compilers, a high end machine ( e.g. 64GB of ram, which is not an impossible target for the today standard) should be enough to not show a "cold cache" problem.

                      BR
                      G.Baroncelli

                      Comment

                      Working...
                      X