Announcement

Collapse
No announcement yet.

Profile Guided Optimizations (PGO) Likely Coming To Linux 5.14 For Clang

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Profile Guided Optimizations (PGO) Likely Coming To Linux 5.14 For Clang

    Phoronix: Profile Guided Optimizations (PGO) Likely Coming To Linux 5.14 For Clang

    Recently the mainline Linux kernel has seen a lot of improvements to its feature set when compiling it under LLVM's Clang rather than GCC as traditionally the only supported compiler. The most recent feature being brought to the Linux kernel when using Clang is finally allowing the use of compiler profile guided optimizations (PGO) for squeezing even greater performance out of the system by letting the compiler leverage the real-world profiles/metrics collected to make more informed code generation / optimization decisions...

    https://www.phoronix.com/scan.php?pa...For-Linux-Next

  • #2
    After the update to track the kernel modules, I am curious as to the full impact on performance. I can see where this could provide a big improvement for HPC type systems, but on a standard workstation, I am curious. As an example, my workstation is running 2 VMs as well as the standard compliment of applications like email client and browser. If you look at the history of the CPU's, they don't tick much above 30% for short periods of time (typically when building in the VMs).

    Comment


    • #3
      Originally posted by dekernel View Post
      I am curious as to the full impact on performance. I can see where this could provide a big improvement for HPC type systems, but on a standard workstation, I am curious.
      As always, YMWV, but the usual answer is that PGO can make smaller improvements even in general purpose workloads (as even the general purpose workloads tend to be biased towards some type of work that may be able to be optimized), but not only certain classic HPC workloads, but certain cloud based service providers, may be able to significantly benefit when you look at the scale involved. So I would expect the usual suspects to move towards using this upstreamed capability rather than maintaining their own custom kernel patch set(s).

      I wonder if some (especially enterprise targeted) distros will start to offer kernel streams optimized for certain types of workloads. Variants such as "container optimized", or "LAMP optimized" might make sense.

      Comment


      • #4
        cant wait. I love PGO

        Comment


        • #5
          For DevilutionX I saw 0 difference when applying PGO so it's not always a given that it will help.

          Comment


          • #6
            when the code is a mess that only compiled with quirky compiler

            Comment


            • #7
              Originally posted by CommunityMember View Post
              I wonder if some (especially enterprise targeted) distros will start to offer kernel streams optimized for certain types of workloads. Variants such as "container optimized", or "LAMP optimized" might make sense.
              Possibly the smaller distros meant for routers and access points will make use of LTO-PGO optimised kernels. I assume projects like OpenWRT, DD-WRT, Hyper-WRT, Tomato, AdvancedTomato, FreshTomato, ... all those that run inside network devices, they should see a nice gain from it especially since these run on lower spec hardware where the gains should be more noticeable. For these should an optimised kernel produce higher throughput and lower latency.

              Comment


              • #8
                Originally posted by sdack View Post
                Possibly the smaller distros meant for routers and access points will make use of LTO-PGO optimised kernels. I assume projects like OpenWRT, DD-WRT, Hyper-WRT, Tomato, AdvancedTomato, FreshTomato, ... all those that run inside network devices, they should see a nice gain from it especially since these run on lower spec hardware where the gains should be more noticeable. For these should an optimised kernel produce higher throughput and lower latency.
                By the same token, older and legacy hardware, such as the recent story of more compatibility for Motorola 68000 cpus could see some benefit for this. Also, VMs with lower resources, Raspberry Pis and other SoCs, not to mention controller projects that are generally special purpose applications,

                i wonder is this implementation is added if the same data can be used by GCC PGO?

                Comment


                • #9
                  Originally posted by dragorth View Post

                  By the same token, older and legacy hardware, such as the recent story of more compatibility for Motorola 68000 cpus could see some benefit for this. Also, VMs with lower resources, Raspberry Pis and other SoCs, not to mention controller projects that are generally special purpose applications,

                  i wonder is this implementation is added if the same data can be used by GCC PGO?
                  IIRC, Rpi used ARMv6 optimized binaries for ARMv7 and ARM64. Also 32-bit kernel on 64-bit hardware, no?

                  Comment


                  • #10
                    Originally posted by caligula View Post
                    IIRC, Rpi used ARMv6 optimized binaries for ARMv7 and ARM64. Also 32-bit kernel on 64-bit hardware, no?
                    They've had a beta version of 64-bit native Raspberry Pi OS, for a couple of years now. You need to know where to find it, but it seems pretty stable. I have no idea when they're planning to mainstream it. Runs on the Pi v3 and newer.

                    Comment

                    Working...
                    X