Announcement

Collapse
No announcement yet.

Google Posts Patches So The Linux Kernel Can Be LTO-Optimized By Clang

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by oleid View Post
    Execution speed is usually not really improving with lto, but size reduction is often measurable.
    That's horseshit. LTO can massively improve execution speed.

    Comment


    • #22
      Originally posted by JustinTurdeau View Post

      That's horseshit. LTO can massively improve execution speed.
      Sure it CAN, but usually it doesn't. At least not on its own.

      Edit:

      What usually helps is profile guided optimization. Especially when combined with LTO.
      Last edited by oleid; 06-25-2020, 03:43 PM.

      Comment


      • #23
        Originally posted by oleid View Post
        Execution speed is usually not really improving with lto, but size reduction is often measurable.
        Size reduction almost always leads to better execution speed. Better utilization of cache and RAM and faster access speeds. It is not a huge gain, but in most cases it should be there, especially if your cpu is on the lower end and has less cache/slower RAM.

        Comment


        • #24
          Originally posted by oleid View Post
          Sure it CAN, but usually it doesn't. At least not on its own.
          It often doesn't speed up well-written programs noticeably because competent developers can reap similar benefits via careful use of other techniques (e.g. static inline functions in headers). It can still have a noticeable impact when functions get inlined across translations units, where the compiler can then do extra -fbuiltin optimizations, constant folding, dataflow analysis etc. that would have otherwise been impossible. For not-so-well written codebases, it can sometimes improve things massively.

          Originally posted by oleid View Post
          What usually helps is profile guided optimization. Especially when combined with LTO.
          PGO is a toy for most applications. Getting high quality profile data is usually more effort than it's worth. LTO is a much easier value proposition for most developers.

          Manual uses of __builtin_expect() and __attribute__((cold)) can also give some of the same benefits without having the constant, uphill battle to get good profile coverage.
          Last edited by JustinTurdeau; 06-25-2020, 09:13 PM.

          Comment


          • #25
            Originally posted by JustinTurdeau View Post
            PGO is a toy for most applications. Getting high quality profile data is usually more effort than it's worth. LTO is a much easier value proposition for most developers.
            I had Firefox in mind. That seems to greatly benefit from pgo.
            Also, 'data processing pipeline' kinded programs. Where getting profiles can be easily automated. And all parts of the program can be touched without much effort.

            Comment


            • #26
              Originally posted by oleid View Post
              ...all parts of the program can be touched without much effort.
              That doesn't mean anything. Merely "touching" a particular code path isn't the same thing as getting high quality, representative profile data.

              Becoming obsessed with just "touching every line" is a masturbatory pursuit. The same point applies to test coverage too. All other things being equal, having more coverage is better than less coverage, but it still doesn't make it complete or high quality and it still doesn't account for every possible state the program can be in before a given path is taken.

              Comment


              • #27
                Originally posted by JustinTurdeau View Post
                PGO is a toy for most applications. Getting high quality profile data is usually more effort than it's worth.
                The performance benefits of PGO widely eclipses those of LTO in my experience. I agree that for most applications it doesn't matter, but the same is true for LTO. However, for cpu intensive stuff, PGO give me ~5-20% performance increase. My use-cases are mainly compression, encoding, rendering, emulation.

                As for getting 'high quality' profile data, just run the application according to your typical usage, it's not rocket science. Better yet, having applications automating this by incorporating PGO support is the ideal solution, Firefox and x264 does this.

                Originally posted by JustinTurdeau View Post
                LTO is a much easier value proposition for most developers.
                Certainly.

                Comment


                • #28
                  Originally posted by JustinTurdeau View Post

                  That doesn't mean anything. Merely "touching" a particular code path isn't the same thing as getting high quality, representative profile data.
                  So how do you get that? In my experience, running the software is enough. It gets its input from the sensors, does ever it does and outputs results.

                  It would not work if not all sensor input was available. That is what I mean with touching all code paths.

                  Comment


                  • #29
                    Originally posted by JustinTurdeau View Post
                    That doesn't mean anything. Merely "touching" a particular code path isn't the same thing as getting high quality, representative profile data.
                    This statement makes no sense, having a code path being executed is how you get profile data. Representative profile data is something you get when running the application per your typical workload.

                    The alternative to this is guesswork from the compiler, which is what you get without PGO, unless you use a lot of compiler extensions which allow you to give 'hints' to said compiler. Linux does this to a large extent, but the vast majority of software, including very performance critical, does not.

                    A real world example, for Blender CPU rendering I've gotten up to 22% performance increase by recompiling with PGO, that is a massive performance boost.

                    Comment


                    • #30
                      Originally posted by oleid View Post
                      So how do you get that? In my experience, running the software is enough.
                      Profile data is only useful if the pattern of usage during collection bears some resemblance to how the program is used in production. Just "touching" lines arbitrarily can do the exact opposite of what you're trying to accomplish, if it's not representative. In general, the more complex/configurable/dynamic a program is, the more work it takes to get high quality data.

                      Firefox is an example of a program that would be pretty hard to generate a good profile for. Something like a parser library, on the other hand, would be an ideal use case for PGO because you can just feed it a big corpus of typical input data.

                      Originally posted by oleid View Post
                      It gets its input from the sensors
                      What sensors? GCC gets its profiling data from *.gcda files that you generate with e.g. gcc -fprofile-generate.

                      Originally posted by oleid View Post
                      It would not work if not all sensor input was available. That is what I mean with touching all code paths.
                      I have no idea what you're talking about.
                      Last edited by JustinTurdeau; 06-26-2020, 03:05 AM.

                      Comment

                      Working...
                      X