Announcement

Collapse
No announcement yet.

GCC 12 Profile Guided Optimization Benchmarks With The AMD Threadripper 3990X

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    On Gentoo you can easily enable PGO using portage-bashrc-mv [1]. Install and set PGO=1 for a single package or system wide. That's it.
    [1]: https://github.com/vaeth/portage-bashrc-mv

    Comment


    • #12
      Originally posted by cj.wijtmans View Post
      is the primary optimization of pgo perhaps branch prediction? Because c++ supports branch prediction hints now. And C does as well. What other major optimizations would pgo have that cant be done manually.
      This is what I was musing about, above.

      Potentially, PGO could do covariant branch analysis, to provide path-level insight. That should be significantly better, in some cases. Whether it actually does this is the question.

      Originally posted by cj.wijtmans View Post
      I think pgo is DOA and we just need more standardized attributes to give the compiler hints.
      I don't mind providing hints when it really counts, but it can also clutter the code.

      Another issue with explicit hints is that some branching behavior can be very workload-dependent.

      Comment


      • #13
        Originally posted by coder View Post
        This is what I was musing about, above.

        Potentially, PGO could do covariant branch analysis, to provide path-level insight. That should be significantly better, in some cases. Whether it actually does this is the question.


        I don't mind providing hints when it really counts, but it can also clutter the code.

        Another issue with explicit hints is that some branching behavior can be very workload-dependent.
        I think the PGO benefit is too small to justify when likely/unlikely hints provides the bulk of the benefit. The performance gain is enormous in a simple algorithm with the hints in c++.

        Comment


        • #14
          Originally posted by cj.wijtmans View Post

          I think the PGO benefit is too small to justify when likely/unlikely hints provides the bulk of the benefit. The performance gain is enormous in a simple algorithm with the hints in c++.
          PGO can spot the hotspot and put more effort to optimize it.

          Suppose that if a loop is the hotspot, then it could offer to use more aggressive optimization, such as unrolling even if its heuristics is against unrolling, and then inline function calls in it even if the heuristics still against inlining it.

          It could also rearrange data so that ones that are often used together is gathered to improve cache hits and put cold dsta somewhere else to avoid taking the precious cache space.

          Comment


          • #15
            Originally posted by NobodyXu View Post
            PGO can spot the hotspot and put more effort to optimize it.

            Suppose that if a loop is the hotspot, then it could offer to use more aggressive optimization, such as unrolling even if its heuristics is against unrolling, and then inline function calls in it even if the heuristics still against inlining it.
            You can hint that a loop has a large number of iterations using GCC builtins like __builtin_expect() in the loop condition, or even __builtin_expect_with_probability().

            GCC also has hot and cold function attributes, for hinting when a function is used frequently or infrequently. However, maybe PGO provides information about specifically how hot or cold it is.

            See my prior post for links to the relevant GCC manual pages.

            However, these hints have limitations, as I alluded to, above. For instance, if you have some code with two, disjoint conditional statements that appear to be independent, I'm not sure if you can hint that they're almost mutually-exclusive in a way GCC will understand & utilize.

            Code:
            for (int i = 0; __builtin_expect_with_probability(i < num_iter, 1, 1023.0/1024.0); ++i)  // hints that num_iter is typically 1024
            {
                __builtin_expect_with_probability(condition_A && condition_B, 1, 0.001);
                __builtin_expect_with_probability(condition_A || condition_B, 1, 0.99);
                if (condition_A) { /* do stuff */ }
                if (condition_B) { /* do other stuff */ }
            }
            Will GCC take the above hints to emit 3 different loops: one where only A is true, another where only B is true, and a 3rd with A and B? Moreover, will it conclude that it shouldn't waste space unrolling the loop with both A and B, since it should be so rare?
            Last edited by coder; 03 August 2022, 11:12 PM.

            Comment


            • #16
              coder Thanks for the info, though this is only doable for very specialized code, presumably scientific computing where you implement a lot of algorithms tailored to specific use case.

              Anything that is more generic, e.g. a function that is designed to be reused in a library, cannot have such hints and the only way to infer hot/cold attribute is to use PGO.

              Comment


              • #17
                Originally posted by NobodyXu View Post
                this is only doable for very specialized code, presumably scientific computing where you implement a lot of algorithms tailored to specific use case.
                Eh, well, it does make a bit of a mess. And to the extent your control-flow is truly data-dependent, it's not necessarily even possible to use static hints.

                Plus, if you mess up a hint, then you could impair performance rather than improving it! Us programmers are sometimes catastrophically wrong in our assumptions about program behavior.

                Originally posted by NobodyXu View Post
                a function that is designed to be reused in a library, cannot have such hints
                Right, another variation: usage-dependent behavior.

                Originally posted by NobodyXu View Post
                the only way to infer hot/cold attribute is to use PGO.
                If you're talking about function hot/cold in a library, then it would either have to be an inline function or you'd have to use static linking + LTO, as well.

                Comment


                • #18
                  Originally posted by coder View Post
                  Eh, well, it does make a bit of a mess. And to the extent your control-flow is truly data-dependent, it's not necessarily even possible to use static hints.
                  I actually quite like the idea of branch-less computing.

                  It might actually yield much better performance by avoiding branches as much as possible.
                  That would eliminate the miss predictions completely and also easier for the compiler to optimize it.

                  Even better, branch-less computing can also avoid some timing based attacks and might migrate against spectre/meltdown.

                  Originally posted by coder View Post
                  Plus, if you mess up a hint, then you could impair performance rather than improving it! Us programmers are sometimes catastrophically wrong in our assumptions about program behavior.
                  Yeah, human are poor at keeping track of the hints they added.
                  When a new change is introduced to the codebase later, nobody will remember the hint there and the unit/integration tests won't print any warning or error.

                  Originally posted by coder View Post
                  If you're talking about function hot/cold in a library, then it would either have to be an inline function or you'd have to use static linking + LTO, as well.
                  Oh thanks for reminding me that.
                  They definitely have to be defined in the headers or use LTO to measure that.

                  Comment


                  • #19
                    I don't think performance regression tests are that uncommon? Also what will pgo do with existing hints? Override them probably 🤔

                    Comment


                    • #20
                      Originally posted by NobodyXu View Post
                      I actually quite like the idea of branch-less computing.

                      It might actually yield much better performance by avoiding branches as much as possible.
                      That would eliminate the miss predictions completely and also easier for the compiler to optimize it.

                      Even better, branch-less computing can also avoid some timing based attacks and might migrate against spectre/meltdown.
                      Everything has overheads. I don't know specifically what form of branchlessness you're talking about, but there's usually no such thing as a free lunch.

                      Originally posted by NobodyXu View Post
                      When a new change is introduced to the codebase later, nobody will remember the hint there and the unit/integration tests won't print any warning or error.
                      If some non-local change is made that invalidates a set of hints, you might see a performance regression -- if you're looking for carefully enough for them.

                      Comment

                      Working...
                      X