Announcement

Collapse
No announcement yet.

Google Updates Patches For AutoFDO+Propeller Optimized Linux Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Google Updates Patches For AutoFDO+Propeller Optimized Linux Kernel

    Phoronix: Google Updates Patches For AutoFDO+Propeller Optimized Linux Kernel

    Google engineers have been working on support for the Linux kernel to leverage AutoFDO feedback directed optimizations and Propeller optimizations when compiling the Linux kernel with LLVM/Clang. In turn this can help Linux systems see 2~10% better performance thanks to the more optimized kernel...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    A 2-3% improvement in Unixbench doesn't seem worth it. A 10% improvement in TCP latency is massive though. Surely there must be more room for improvement in the kernel somewhere?

    Comment


    • #3
      Just stick to GCC and -O2 instead of anything else to show your loyalty to GNU, The true believers don't need extra performance from the evil Google.

      Comment


      • #4
        Originally posted by ahrs View Post
        A 2-3% improvement in Unixbench doesn't seem worth it. A 10% improvement in TCP latency is massive though. Surely there must be more room for improvement in the kernel somewhere?
        For the hyperscalers, even a 2-3% improvement is a massive savings (in operational costs (power, cooling, etc.), and not having to purchase more equipment). This is real money.

        Comment


        • #5
          I'd like to see improvements in the ease-of-use to get AutoFDO/Propeller and BOLT into the hands of normal users. Having read the workflow description in the first patch, this doesn't seem to be as seemless as advertised. I was also told by the CachyOS devs that BRS is not available on consumer Zen 3 CPUs. As Zen 4's implementation is said to be buggy, this practically limits this feature to Intel's LBR-capable CPUs at the moment for desktop usage.

          Comment


          • #6
            10% latency reduction is huge, is this applicable for CachyOS?

            Comment


            • #7
              Originally posted by Errinwright View Post
              10% latency reduction is huge, is this applicable for CachyOS?
              I have not tried it yet since the v2 patchset. I have tried to apply AutoFDO with the first revision, but somehow, when trying to convert the recorded profile with the advertised command, the profile was just empty.

              Ive asked in the ClangBuiltLinux community, but no one had an answer for that. Maybe it was just my recorded profile (did run compression, stress-ng and kernel-compilation), or something odd with AMD's branch sampling is going on. Will work further on it, when I find some time.

              BOLT I have tried too, there I was successful but the Kernel instantly crashed when the --split-functions option was used.

              Comment


              • #8
                Originally posted by ms178 View Post
                I'd like to see improvements in the ease-of-use to get AutoFDO/Propeller and BOLT into the hands of normal users.
                While it may be an interesting intellectual exercise, unlike the hyperscalers (who often have a large number of dedicated systems doing just one thing (or just a few things) ideally as fast as possible), most desktops are general purpose, so profiling does not always provide a useful optimization path which would result in equivalent improvements.

                Comment


                • #9
                  Originally posted by CommunityMember View Post

                  While it may be an interesting intellectual exercise, unlike the hyperscalers (who often have a large number of dedicated systems doing just one thing (or just a few things) ideally as fast as possible), most desktops are general purpose, so profiling does not always provide a useful optimization path which would result in equivalent improvements.
                  I suppose it would be hard for a general-purpose distro to use that feature, as it would be hard to get a profile that fits everyone's needs and hardware perfectly. But for my own local system, I think it still might be helpful for desktop usage if I profile the games and apps I personally use?

                  At least I've seen some improvements while using an older PGO patchset from Google and noticed faster compile times and lower CPU usage with my 1 Gbit connection. There were even some minor performance gains in games. All of the invested effort might not be worth the reward. But there are enough people out there that only care about performance and might find that feature useful for their personal needs.

                  Comment

                  Working...
                  X