Originally posted by binarybanana
View Post
Regarding currently used profiles. AFAIK, only CPython uses its test suite as a training workload. Bot GCC and LLVM (more precisely Clang since PGO in LLVM right now is enabled only for Clang, for other LLVM subprojects there is an open issue: https://github.com/llvm/llvm-project/issues/63486) as a PGO training workload use compiling GCC/LLVM itself with the instrumented compiler, it's done via multi-stage builds (so instrumented GCC compiles GCC iteself to collect the profiles, Clang compiles Clang) but of course maintainers can decide to choose another training workload).
Sampling PGO (also known as AutoFDO: https://github.com/google/autofdo) is an interesting approach to perform PGO in practice with an aim to reduce PGO instrumentation overhead but also has some limitations like requirement for having LBR/BRS support in your hardware, weak tooling support from the Google side (you can check AutoFDO issue tracker for them), etc. Things here also need to be improved.
Leave a comment: