Announcement

Collapse
No announcement yet.

XZ 5.6 Released: Sandboxing Improvements, Prefers -O2 Instead Of -O3

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Anux
    replied
    Originally posted by chithanh View Post
    The point is that if an optimization is part of both -O3 and -Os then it does not increase code size.
    The gcc manpage lists dozens of such optimizations.
    Yes, that's why I listed all O3 opts. None of them are in Os or Oz. Or do you have any other resource?

    Leave a comment:


  • chithanh
    replied
    Originally posted by Anux View Post
    Of course but what has this to do with the O2/O3 topic? Edit: If there were any considerable size improvements with O3, those options would also be used with Os or Oz.
    The point is that if an optimization is part of both -O3 and -Os then it does not increase code size.
    The gcc manpage lists dozens of such optimizations.

    Which optimizations end up being applied depends of course on the code, so it may vary how much if at all binaries increase in size. Most will but some will not.

    Leave a comment:


  • Anux
    replied
    Originally posted by chithanh View Post
    Not all optimizations of -O3 necessarily increase code size over -O2, though many of them do.​
    • -fgcse-after-reload (not sure, sounds like a slight size reduction?)
    • -fipa-cp-clone (size increase)
    • -floop-interchange (no size effect)
    • -floop-unroll-and-jam (size increase)
    • -fpeel-loops (size increase)
    • -fpredictive-commoning (don't know)
    • -fsplit-loops (size increase)
    • -fsplit-paths (not sure)
    • -ftree-loop-distribution (size increase)
    • -ftree-partial-pre (size increase)
    • -funswitch-loops (size increase)
    • -fvect-cost-model=dynamic (size increase)
    • -fversion-loops-for-strides (size increase)
    Apart from 3 unknowns everything screams for more bin size.

    If you check e.g. gcc manpage, -Os and -O3 have considerable overlap.
    Optimize for size. -Os enables all -O2 optimizations except those that often increase code size:​
    Of course but what has this to do with the O2/O3 topic? Edit: If there were any considerable size improvements with O3, those options would also be used with Os or Oz.
    Last edited by Anux; 26 February 2024, 09:59 AM.

    Leave a comment:


  • chithanh
    replied
    Originally posted by Anux View Post
    O3 leading to smaller binary seems wrong to me. The main thing that O3 does is more unrolling so it should always lead to bigger binaries or same size if O2 already unrolled everything.

    If O3 is faster depends on your code and the CPU you use. Older or low end CPUs typically suffer from cache misses the larger your binaries get.
    Not all optimizations of -O3 necessarily increase code size over -O2, though many of them do. If you check e.g. gcc manpage, -Os and -O3 have considerable overlap.

    When -O3 increases code size, this also has the problem if you multitask that processes will be more likely to evict each other from L2 cache. This kind of problem is usually not shown in single-purpose benchmarks.

    Leave a comment:


  • Anux
    replied
    Originally posted by binarybanana View Post
    Anux
    Rust is similar to C++, maybe worse, in terms of code bloat due to monomorphisation of all the generics.
    The only generics I use are vec and the image crate. And for the image I only use u8 so it should compile to only one version and be therefore smaller and faster than polymorph code.
    Also I use only one codegen-unit in release/bench which should lead to smaller binaries for monomorph code.

    I also suspect this might somewhat depend on system load, meaning a large number of concurrent processes would benefit from smaller code vs. a single process hogging the CPU, but I have seen no testing on this yet.
    No need for testing, of course if your L3 is filled with different programs there will be more eviction/misses with larger binaries than with smaller ones.

    Leave a comment:


  • binarybanana
    replied
    Anux
    Rust is similar to C++, maybe worse, in terms of code bloat due to monomorphisation of all the generics. Wouldn't be surprised if that causes too much code with O3, filling up caches making it slower overall. I can see this tradeoff working out differently with plain C. But yes, in general you are correct that it depends on the specific CPU and there's a threshold after which optimizations that increase code size hurt performance. I also suspect this might somewhat depend on system load, meaning a large number of concurrent processes would benefit from smaller code vs. a single process hogging the CPU, but I have seen no testing on this yet.

    Leave a comment:


  • Anux
    replied
    Originally posted by V1tol View Post
    During my experiments I found that doing -O3 with -ffunction-sections​ -fdata-sections​ and -Wl,--gc-sections produces the same or even smaller binaries than -O2 while being almost always faster. Even Mesa is working using that flags, won't even try on kernel builds. Of course I use LTO whenever supported.
    O3 leading to smaller binary seems wrong to me. The main thing that O3 does is more unrolling so it should always lead to bigger binaries or same size if O2 already unrolled everything.

    If O3 is faster depends on your code and the CPU you use. Older or low end CPUs typically suffer from cache misses the larger your binaries get.

    I'm toying around with some rust and my own code runs faster with O2 while the image crate that saves to png is faster with O3.

    Leave a comment:


  • binarybanana
    replied
    Originally posted by V1tol View Post
    During my experiments I found that doing -O3 with -ffunction-sections​ -fdata-sections​ and -Wl,--gc-sections produces the same or even smaller binaries than -O2 while being almost always faster. Even Mesa is working using that flags, won't even try on kernel builds. Of course I use LTO whenever supported.
    Will try your flags on my system, thanks. Currently I'm using "vanilla" gentooLTO flags + -falign-functions=64 (Intel CPU), the latter which might interact with -functions sections, need to check that.
    Last edited by binarybanana; 26 February 2024, 05:06 AM.

    Leave a comment:


  • V1tol
    replied
    During my experiments I found that doing -O3 with -ffunction-sections​ -fdata-sections​ and -Wl,--gc-sections produces the same or even smaller binaries than -O2 while being almost always faster. Even Mesa is working using that flags, won't even try on kernel builds. Of course I use LTO whenever supported.

    Leave a comment:


  • ahrs
    replied
    Originally posted by avis View Post
    In my experience -O2 -flto produces faster and smaller binaries than -O3. And PGO adds performance on top of that (but PGO is not always viable and some applications don't support it, e.g. Wine).
    I think you could do PGO with Wine, the problem would be how would you properly exercise it? It has the same problems as PGO in the Linux kernel, which I think Google has also experimented with but ultimately rejected upstream for not being viable. It's not hard to imagine how you you could devise a set of profiles that favour performance in some scenarios over others and doesn't properly exercise everything and sometimes ends up reducing performance as a result.

    Leave a comment:

Working...
X