Announcement

Collapse
No announcement yet.

A Fresh Look At The PGO Performance With GCC 8

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Grinch
    replied
    Originally posted by carewolf View Post

    The point is that one of the tricks PGO has is enabling many of the O3 optimizations on demand, so it will use them in all hot code, as if you had indicated O3 even if you didn't.
    Hmmm... PGO does not enable any of the -O3 optimizations, which in turn are '-finline-functions', '-fweb', '-frename-registers'.

    PGO enables '-fbranch-probabilites', '-fvpt', -funroll-loops,'-fpeel-loops','-ftracer'

    If you are right then that is very interesting, but I've seen no such information, are you sure ?

    Leave a comment:


  • eva2000
    replied
    Originally posted by Michael View Post

    Yes that snapshot reports 8.1.1.
    cheers so it isn't my eyes playing tricks on me

    Leave a comment:


  • carewolf
    replied
    Originally posted by Grinch View Post

    I disagree, the binary size differences aren't particularly large between -Os, -O2, -O3 on the vast majority of code unless you are using extremely constrained hardware, also one of the optimizations that PGO enable is loop unrolling (-funroll-loops) which is one of the optimizations that has a very large impact on binary size.

    Besides that, if you go through the trouble of using PGO, you are most likely looking for the best possible performance, which with very few exceptions, is something you get from -O3 / -Ofast .
    The point is that one of the tricks PGO has is enabling many of the O3 optimizations on demand, so it will use them in all hot code, as if you had indicated O3 even if you didn't.

    Leave a comment:


  • Grinch
    replied
    Originally posted by carewolf View Post
    Try O2 or Os profile guided. It can do much of the same as O3 without blowing up binary size
    I disagree, the binary size differences aren't particularly large between -Os, -O2, -O3 on the vast majority of code unless you are using extremely constrained hardware, also one of the optimizations that PGO enable is loop unrolling (-funroll-loops) which is one of the optimizations that has a very large impact on binary size.

    Besides that, if you go through the trouble of using PGO, you are most likely looking for the best possible performance, which with very few exceptions, is something you get from -O3 / -Ofast .

    Leave a comment:


  • Michael
    replied
    Originally posted by eva2000 View Post
    PGO is nice to have, I do PHP 7 PGO compiles for that extra bit of performance.

    Michael did you compile GCC 8.2 from http://www.netgull.com/gcc/snapshots/8.2.0-RC-20180719/ snapshot ? I just tried and gcc version is still reported as 8.1.1 ?
    Yes that snapshot reports 8.1.1.

    Leave a comment:


  • carewolf
    replied
    Try O2 or Os profile guided. It can do much of the same as O3 without blowing up binary size

    Leave a comment:


  • eva2000
    replied
    PGO is nice to have, I do PHP 7 PGO compiles for that extra bit of performance.

    Michael did you compile GCC 8.2 from http://www.netgull.com/gcc/snapshots/8.2.0-RC-20180719/ snapshot ? I just tried and gcc version is still reported as 8.1.1 ?

    Leave a comment:


  • Grinch
    replied
    Nice benchmark Michael, a couple of things:

    You don't need to use the '-fprofile-dir=foo/' parameter, you can just do '-fprofile-generate=foo/' and likewise '-fprofile-use=foo/'

    In the 'm-queens v1.1' benchmark you listed to following options: -fopenmp -O3 -march=native -O2

    The -O2 at the end will override -O3 at the start, which is probably not what you wanted.

    Overall the amount of benefit from PGO depends on how well or not the compiler manages to guess the correct optimization strategies without profile data, one piece of software I've compiled which has benefited greatly is rendering in Blender (cpu rendering naturally) where I've had ~15-20% improvement, as such it was interesting to see that the largest benefit in these tests was in the C-Ray renderer with ~17%.

    Leave a comment:


  • cen1
    replied
    I guess it's useful if you need to get that last few % performance benefits out of your program, like in HFT etc. Feeding realistic workloads is the important part.

    Leave a comment:


  • phoronix
    started a topic A Fresh Look At The PGO Performance With GCC 8

    A Fresh Look At The PGO Performance With GCC 8

    Phoronix: A Fresh Look At The PGO Performance With GCC 8

    It's been a while since we last ran some GCC PGO benchmarks, the Profile Guided Optimizations or feedback-directed optimization technique that makes use of profiling data at run-time to improve performance of re-compiled binaries. Here are some fresh benchmarks of GCC PGO impact on a Xeon Scalable server while using the newly-released GCC 8.2 release candidate.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite
Working...
X