Announcement

**binarybanana** · 26 February 2024, 05:30 AM

Anux
Rust is similar to C++, maybe worse, in terms of code bloat due to monomorphisation of all the generics. Wouldn't be surprised if that causes too much code with O3, filling up caches making it slower overall. I can see this tradeoff working out differently with plain C. But yes, in general you are correct that it depends on the specific CPU and there's a threshold after which optimizations that increase code size hurt performance. I also suspect this might somewhat depend on system load, meaning a large number of concurrent processes would benefit from smaller code vs. a single process hogging the CPU, but I have seen no testing on this yet.

**Anux** · 26 February 2024, 06:04 AM

Originally posted by binarybanana View Post

Anux
Rust is similar to C++, maybe worse, in terms of code bloat due to monomorphisation of all the generics.

The only generics I use are vec and the image crate. And for the image I only use u8 so it should compile to only one version and be therefore smaller and faster than polymorph code.
Also I use only one codegen-unit in release/bench which should lead to smaller binaries for monomorph code.

I also suspect this might somewhat depend on system load, meaning a large number of concurrent processes would benefit from smaller code vs. a single process hogging the CPU, but I have seen no testing on this yet.

No need for testing, of course if your L3 is filled with different programs there will be more eviction/misses with larger binaries than with smaller ones.

**chithanh** · 26 February 2024, 06:42 AM

Originally posted by Anux View Post

O3 leading to smaller binary seems wrong to me. The main thing that O3 does is more unrolling so it should always lead to bigger binaries or same size if O2 already unrolled everything.

If O3 is faster depends on your code and the CPU you use. Older or low end CPUs typically suffer from cache misses the larger your binaries get.

Not all optimizations of -O3 necessarily increase code size over -O2, though many of them do. If you check e.g. gcc manpage, -Os and -O3 have considerable overlap.

When -O3 increases code size, this also has the problem if you multitask that processes will be more likely to evict each other from L2 cache. This kind of problem is usually not shown in single-purpose benchmarks.

**Anux** · 26 February 2024, 09:56 AM

Originally posted by chithanh View Post

Not all optimizations of -O3 necessarily increase code size over -O2, though many of them do.

-fgcse-after-reload (not sure, sounds like a slight size reduction?)
-fipa-cp-clone (size increase)
-floop-interchange (no size effect)
-floop-unroll-and-jam (size increase)
-fpeel-loops (size increase)
-fpredictive-commoning (don't know)
-fsplit-loops (size increase)
-fsplit-paths (not sure)
-ftree-loop-distribution (size increase)
-ftree-partial-pre (size increase)
-funswitch-loops (size increase)
-fvect-cost-model=dynamic (size increase)
-fversion-loops-for-strides (size increase)

Apart from 3 unknowns everything screams for more bin size.

If you check e.g. gcc manpage, -Os and -O3 have considerable overlap.

Optimize for size. -Os enables all -O2 optimizations except those that often increase code size:

Of course but what has this to do with the O2/O3 topic? Edit: If there were any considerable size improvements with O3, those options would also be used with Os or Oz.

**chithanh** · 26 February 2024, 10:38 AM

Originally posted by Anux View Post

Of course but what has this to do with the O2/O3 topic? Edit: If there were any considerable size improvements with O3, those options would also be used with Os or Oz.

The point is that if an optimization is part of both -O3 and -Os then it does not increase code size.
The gcc manpage lists dozens of such optimizations.

Which optimizations end up being applied depends of course on the code, so it may vary how much if at all binaries increase in size. Most will but some will not.

**Anux** · 26 February 2024, 10:57 AM

Originally posted by chithanh View Post

The point is that if an optimization is part of both -O3 and -Os then it does not increase code size.
The gcc manpage lists dozens of such optimizations.

Yes, that's why I listed all O3 opts. None of them are in Os or Oz. Or do you have any other resource?

Announcement

XZ 5.6 Released: Sandboxing Improvements, Prefers -O2 Instead Of -O3

Comment

Comment

Comment

Comment

Comment

Comment