Originally posted by skeevy420
View Post
Announcement
Collapse
No announcement yet.
Fedora's Firefox To Stick With GCC Over Clang, Beefed Up By LTO/PGO Optimizations
Collapse
X
-
Originally posted by hubicka View PostNote that http://hubicka.blogspot.com/2018/12/...lding-and.html has some additional benchmarks to speedometer. One remaining issue is the fact that Skia (a graphic library used to render some stuff) needs to be ported to GCC. Currently it has hand optimized vector rendering code only for Clang. I plan to look into that after finishing some GCC work - Firefox is very good interesting real-world LTO benchmark and there was number of things to fix/improve for GCC 9 which I noticed while looking into its performance.
- Likes 1
Comment
-
Originally posted by skeevy420 View PostBeing able to get more life and performance out of older hardware means we're not being wasteful consumers by tossing it aside and upgrading as fast as possible.
Things like Clear Linux (and macOS) make this extremely hard.
Comment
-
Originally posted by mlau View Post
I'd start Gen 2 at Haswell (AVX2/FMA3, BMI1/2, ...). Intel cpu's haven't changed much since then (apart from avx512 which has only niche uses at the moment).
Originally posted by kpedersen View Post
Thats exactly what I was saying. Breaking free from this constant consumer upgrade cycle is "liberating"
Things like Clear Linux (and macOS) make this extremely hard.
It makes sense for Intel to work on what they're doing -- targeted binaries and AVX+Newer optimizations. That work can then be used by Arch, Fedora, or Debian for a multi-generation x86_64 model or expanded to cover more micro-architectures/feature sets like Solus.
We're now dealing with "i486-i586-i686 64-Bit Edition". It'll be interesting to see what approach various distributions go with.
Comment
-
Originally posted by kpedersen View Post
Thats exactly what I was saying. Breaking free from this constant consumer upgrade cycle is "liberating"
Things like Clear Linux (and macOS) make this extremely hard.
There were other factors which made the adoption of these newer CPU instructions very slow (e.g. Intel's own product segmentation, or the implementation of these vector extensions which made its usage dependant on the software developers - unlike the new ARM SVE which scales automatically with increased vector sizes). All of these mistakes are quite a pitty considering vectorization and parallelization were the major source of CPU innovation during the last decade.
I want to see all of this goodness used more effectively instead of the brute force approach of higher IPC and frequency!
- Likes 1
Comment
-
Originally posted by ms178 View Post
Thanks a lot for your optimization work! As you see from all the feedback it had quite an impact! Just in case you need another test case for further LTO/PGO tuning could have a look at Chromium for some optimization work, too?! Or is it already a well tested target internally at Suse?
- Likes 3
Comment
-
Originally posted by hubicka View Post
I am currently looking into hhvm and clang binary for bit more tests. Chromium builds by GCC in SUSE's RPM package, so I can try to look at it, too. I did about two or three years ago last time since I am not that familiar with its build machinery. Since LTO support was added in meanwhile, I guess it is time to try ago.
See: https://bugs.chromium.org/p/chromium...tail?id=906037Last edited by ms178; 10 January 2019, 06:49 PM. Reason: due to wrong interpretation the bug was all about
Comment
-
Originally posted by ms178 View PostI want to see all of this goodness used more effectively instead of the brute force approach of higher IPC and frequency!
Comment
-
Originally posted by Weasel View PostI don't think you know what brute force means. Increasing IPC is much more complicated than adding vector instructions or widening the vectors.
Just for comparison, the HSA approach with APUs is in my (layman) eyes way more innovative albeit not fully where it needs to be hard- and software wise yet. Also AMD has the disadvantage of the underdog here to sway the rest of the market to adopt their framework.
Comment
-
Originally posted by ms178 View PostYou mix seperate things I said there and there are certainly more aspects relevant to IPC than adding vector instructions and widening vectors.
Increasing vector sizes is trivial in comparison, that's why they even go this route, to avoid complexity on the chip. OO scales badly in comparison with vectors. So increasing vector sizes is much closer to "brute force" in this case, because it's the simplest and linear-scaling solution. Brute force generally means the most straightforward approach to something, not clever or complex, which is what widening the vectors is.
Okay, increasing frequency (without doing anything else, aka overclocking) is technically easier but...
Comment
Comment