Announcement

Collapse
No announcement yet.

Following LTO, Linux Kernel Patches Updated For PGO To Yield Faster Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by kvuj View Post

    Could someone explain to me like I'm five?
    I know there have been a number of attempts to explain it, but I'll aim for a true "like I'm five" explanation.

    Let's say you run a grocery store and you want to start selling bananas.

    First, you get a bunch of bananas, then you put them in your store, somewhere in produce. After selling them for a while, you notice that shoppers are really going out of their way to buy bananas, so you move them so that they are at the front of produce.

    To relate this to code:
    First, you get a bunch of bananas (write code), then you put them in your store, somewhere in produce (first compile). After selling them for a while, you notice that shoppers are really going out of their way to buy bananas(collecting profile information), so you move them so that they are at the front of produce (recompiled based on profile gathered information).

    Comment


    • #12
      The data collected for profile guided optimisations are not only a usage for profile guided optimisation. Being able generate profile and see where compiler with its default logic chooses the wrong path can sometimes lead to code changes like putting __attribute__((cold)) in a few places to tell the compiler that the function is a cold path.

      Profile guided optimisation full blown has its issues of being exact to a work load. Using the data collected for profile guided optimisation and comparing against what the compiler guessed the optimisations can at times show where generic speed gains can be found by adding some of the hints so the compiler does not guess those areas as wrong.

      Comment


      • #13
        How the hell do you build profiles for the Linux kernel? What does even qualify as a profile, when everyone's usage patterns are different?

        Comment


        • #14
          Finally, an article on PGO...

          During the last year I just heard about LTO...

          Comment


          • #15
            Originally posted by bug77 View Post
            How the hell do you build profiles for the Linux kernel? What does even qualify as a profile, when everyone's usage patterns are different?
            From the kernel perspective, I don't see the general usage patterns being very different unless you are doing very specific workloads, like a server, massive number crunching, routing etc.

            Also I don't expect any dramatic performance improvements given that the kernel code make use of macros that effectively control the generated code, like __builtin_expect for branch prediction etc.

            I recall reading a paper from Google (I think it was posted here) where they showed the results of TCP stream transfer on a PGO optimized Linux kernel resulting in ~3-5% improvement which was actually more than I expected.

            I don't see any reason why automated PGO in the Linux kernel wouldn't result in decent overall performance improvements, just have the automation perform typical tasks in each subsystem during the profiling stage, if there are any outlier usage patterns that see performance harmed by this then you can simply avoid using PGO for those specific needs.

            To make it clear, even when you don't use PGO, certain usage patterns are bound to be penalized (as in not placed in the hot code path which is more likely to be cached), the difference is that with PGO, those usage patterns that are penalized will be those not used or seldom used during the profile stage, instead of in worst case scenario without PGO, those which are most often used (due to the compiler guessing wrong).

            Comment


            • #16
              Originally posted by CommunityMember View Post
              As Google has been clang'ing their Android kernels for quite some time (for CFI)
              Google is also using clang-compiled kernels on their server fleets.

              Google used to be a quite active GCC contributor, but they have shifted their attention to the LLVM ecosystem. I'm not sure why, though I guess it's a combination of license and LLVM by virtue of being newer is more modular and cleaner.

              Comment


              • #17
                Originally posted by jabl View Post
                I'm not sure why, though I guess it's a combination of license and LLVM by virtue of being newer is more modular and cleaner.
                Agreed, coupled with the majority of people coming out of CS these days being more interested in / having worked with, LLVM, and they are then hired at Google.

                Comment


                • #18
                  Originally posted by bug77 View Post
                  How the hell do you build profiles for the Linux kernel? What does even qualify as a profile, when everyone's usage patterns are different?
                  You can get some interesting data from profiles, not just specific to particular workloads. It can help identify some stuff where you can provide better hints to the compiler within the source code itself, that would be used during non-PGO builds. I imagine effective profiling will result in these kinds of changes heading upstream, rather than relying solely on the information when doing PGO builds.

                  Mostly I would picture this as being useful for more dedicated devices, with a workload that is dominated by a particular one, and moving away from generic kernels to ones specifically compiled for purpose. I would be surprised if we see PGO'd kernels shipped by standard distributions. Maybe they'd add them in as specific flavoured kernels you could install (e.g. Ubuntu has generic, lowlatency etc. kernels, it could be just another set of kernels from there? Feels a bit clutching at straws there though)

                  Comment


                  • #19
                    Originally posted by Garp View Post
                    You can get some interesting data from profiles, not just specific to particular workloads. It can help identify some stuff where you can provide better hints to the compiler within the source code itself, that would be used during non-PGO builds. I imagine effective profiling will result in these kinds of changes heading upstream, rather than relying solely on the information when doing PGO builds.
                    But you don't need PGO for this. You just need to profile and then add the code hints.

                    Originally posted by Garp View Post
                    Mostly I would picture this as being useful for more dedicated devices, with a workload that is dominated by a particular one, and moving away from generic kernels to ones specifically compiled for purpose. I would be surprised if we see PGO'd kernels shipped by standard distributions. Maybe they'd add them in as specific flavoured kernels you could install (e.g. Ubuntu has generic, lowlatency etc. kernels, it could be just another set of kernels from there? Feels a bit clutching at straws there though)
                    Yes, that's what I smelled it was for, too. I feel PGO benefits most programs that follow the UNIX philosophy "do one thing and do it right". A compressor/decompressor has some clear code paths, an utility like sed or awk are also doing "one thing". More complex program probably fall more into the first category you describe (read the profile to find out which additional hints you can add and where).

                    Comment


                    • #20
                      Originally posted by Zan Lynx View Post
                      I don't know if this project is using it, but there's another option besides compiling the program with profiling support included. You can use "perf" instead. The CPU profiling counters can be recorded and used to generate a profile.

                      It may not be as accurate as an instruction-precise instrumented profile recording, but it has almost no overhead and can be run on any program or the kernel itself.
                      Ah, I was wondering about the overhead of profiling.

                      Comment

                      Working...
                      X