Announcement

Collapse
No announcement yet.

Mesa Developers Discuss LTO'ing + PGO'ing Builds For Greater Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by oibaf View Post

    I never tried it. The PPA should be able to be used by everyone, march=native is useful only for build being used on your own system.
    Yes I know thats why I'm asking if you had some gains on your personal system by experimenting with it.
    Sure providing oibaf builds optimized for different cpus in the debian ecosystem would be _a lot_ of work therefore not feasible.

    Comment


    • #12
      Originally posted by TemplarGR View Post
      16%-20% faster execution in OpenGL is no joke.
      Note, that this is probably true only for CPU bound workloads.
      Last edited by puleglot; 13 February 2020, 06:23 PM.

      Comment


      • #13
        I did compile Mesa on Clear Linux with -march=native and -flto [but without PGO] and some other flags but to my surprise it didn't yield anything at least in Company of Heroes 2 which I usually use for benchmarking. On the other hand, compiling the Kernel with some more agressive compiler flags, with custom settings and BMQ patched-in, I saw some nice benefits in the 10 - 15 % range.

        Hence Dieters' result surprised me a little and they imply that PGO does provide a substantial benefit.

        Comment


        • #14
          Originally posted by ms178 View Post
          I did compile Mesa on Clear Linux with -march=native and -flto [but without PGO] and some other flags but to my surprise it didn't yield anything at least in Company of Heroes 2 which I usually use for benchmarking. On the other hand, compiling the Kernel with some more agressive compiler flags, with custom settings and BMQ patched-in, I saw some nice benefits in the 10 - 15 % range.

          Hence Dieters' result surprised me a little and they imply that PGO does provide a substantial benefit.
          Anecdotal evidence: LTO improved mesa software-rendered glxgears perf on MIPS32 by almost 50% (going from 8fps to 12fps at 800x480).
          So LTO might after all be beneficial when your cpu is underpowered.

          Comment


          • #15
            Mesa is already really fast and for it to have this much of an increase is a really big deal. It'd be stupid not to go for it.

            Comment


            • #16
              Originally posted by ms178 View Post
              I did compile Mesa on Clear Linux with -march=native and -flto [but without PGO] and some other flags but to my surprise it didn't yield anything at least in Company of Heroes 2 which I usually use for benchmarking.
              I have experienced that in some cases (e.g. openblas) march=haswell produces a better code on my skylake than using march=native or march=skylake. Maybe march=haswell mtune=skylake will result in a faster code (even if only marginally noticeable)

              Originally posted by ms178 View Post
              On the other hand, compiling the Kernel with some more agressive compiler flags, with custom settings and BMQ patched-in, I saw some nice benefits in the 10 - 15 % range.
              I have compiled the xanmod kernel with march=native on my fx-8350 under popos and can support your statement.
              Btw did you apply the fsync patch on the Clear Linux kernel?

              Originally posted by ms178 View Post
              Hence Dieters' result surprised me a little and they imply that PGO does provide a substantial benefit.
              As mentioned by others - maybe opengl sees more benefit because of its higher CPU bounded nature? Nonetheless each percent counts.

              My personal "project dream" would be:

              Clearlinux Kernel modified heading toward xanmod
              Mesa build with a fusion of oibaf + clearlinux flags (+pgo'ed)
              wine clearlinux build but pgo'ed for each game or at least gameengine
              dxvk pgo'ed (not sure yet how big the difference will be / pgo customized for each game)
              (trying to squezze in agner fogs asmlib to speedup memcpy() etc - also not sure if this will gain much)

              We have to compensate the overhead of wrapping Windows/DX stuff to Linux/Vulkan somehow.
              Last edited by CochainComplex; 13 February 2020, 09:56 AM.

              Comment


              • #17
                Originally posted by CochainComplex View Post

                I have experienced that in some cases (e.g. openblas) march=haswell produces a better code on my skylake than using march=native or march=skylake. Maybe march=haswell mtune=skylake will result in a faster code (even if only marginally noticeable)



                I have compiled the xanmod kernel with march=native on my fx-8350 under popos and can support your statement.
                Btw did you apply the fsync patch on the Clear Linux kernel?



                As mentioned by others - maybe opengl sees more benefit because of its higher CPU bounded nature? Nonetheless each percent counts.

                My personal "project dream" would be:

                Clearlinux Kernel modified heading toward xanmod
                Mesa build with a fusion of oibaf + clearlinux flags (+pgo'ed)
                wine clearlinux build but pgo'ed for each game or at least gameengine
                dxvk pgo'ed (not sure yet how big the difference will be / pgo customized for each game)
                (trying to squezze in agner fogs asmlib to speedup memcpy() etc - also not sure if this will gain much)

                We have to compensate the overhead of wrapping Windows/DX stuff to Linux/Vulkan somehow.
                My numbers were from a Ryzen 2600 PC with a Vega 56 and I haven't applied the fsync patches. By the way, I did try out PopOS and xanmod recently - it was decent, but all Ubuntu derivates feel way too sluggish on my machines, opening programs and other menus just takes noticeably longer than on other distros, e.g. openSUSE Tumbleweed feels way more fluid from the start.

                I also had quite a lot of trouble on Clear Linux with their autospec build system than on other distros, as I also played around with different kernel configurations and it wouldn't compile properly with my custom flags at first and had to disable quite a few things to get it to work eventually. Even then their build script assumed the presence of some modules and would error out for not finding them. That was quite painful!

                You have a great whish list there and I see we share the same dream, I hope we will get a gaming distro somewhen in the future which also offers architecture-specific repositories (like openmandriva already does) and everything tuned for the best possible performance out of the box for gaming and general desktop computing. Clear Linux isn't well suited for that task today, but I hope we will see this dream come true without having to compile all of it ourselves.

                Comment


                • #18
                  Originally posted by Tvashtar View Post
                  I've been doing it for years under Gentoo ( LTO+PGO ) by hand, would be nice to have it fully automated from upstream like in Firefox
                  It'd be nice if there was a central repo containing the PGO profiles to save folk generating them themselves

                  I've not used PGO before on Gentoo, what's actually produced and how is it then reused by the build?

                  If it's a directory would you mind dumping it somewhere? I'm curious to know whether profiles can be partially reused between versions of apps. i.e. does it say a particular function is used a lot and another less so. So as long as the function names are the same the profile will work? Or is it a lot more complicated than that

                  Anyway I'm babbling

                  Comment


                  • #19
                    One would need to merge the results of different pgo runs from different machines with different GPUs. If that's possible, one could ship that data with the source of each release. No need to do the pgo run on the build host.

                    I've used pgo with embedded Linux devices. Works fine to generate the profile on a different machine.

                    Comment


                    • #20
                      Originally posted by oleid View Post
                      One would need to merge the results of different pgo runs from different machines with different GPUs. If that's possible, one could ship that data with the source of each release. No need to do the pgo run on the build host.

                      I've used pgo with embedded Linux devices. Works fine to generate the profile on a different machine.
                      Sounds like the shadercache approach of DXVK

                      Comment

                      Working...
                      X