Announcement

**arjan_intel** · 15 December 2016, 12:27 AM

Originally posted by indepe View Post

Well, that much I know already. I think. When you talk about "default compiler flags", those are the ones that you get when you type "gcc helloWorld.c", right ?

And/or are those the same ones that a distribution uses to compile the kernel and the packages? And/or just the ones that Michael uses to compile the test programs?

the "default compiler flags" by convention are what you get if you do

gcc $(CFLAGS) helloWorld.c

(which is a pretty darn global convention likely dating back to the 70's or so)

On Clear Linux, these are very similar to what we compile the distro packages with (there's some minor differences, mostly due to some policies around security flags we apply to our packages but which are not appropriate to automatically force on everyone else)

**indepe** · 15 December 2016, 12:48 AM

Originally posted by arjan_intel View Post

the "default compiler flags" by convention are what you get if you do

gcc $(CFLAGS) helloWorld.c

(which is a pretty darn global convention likely dating back to the 70's or so)

On Clear Linux, these are very similar to what we compile the distro packages with (there's some minor differences, mostly due to some policies around security flags we apply to our packages but which are not appropriate to automatically force on everyone else)

Thanks for the info! Sounds like you would recommend using $(CFLAGS), more or less, when compiling on Clear Linux (at least).

**zboson** · 15 December 2016, 05:47 AM

From what I can tell so far Clear Linux's main contribution is testing where Function Multiversioning (FMV) shows the most improvement. Probably many other distros could benefit from this.

Function multi-versioning in GCC 6 [LWN.net]

https://lwn.net/Articles/691932/

https://clearlinux.org/features/func...versioning-fmv

I have discussed CPU dispatching often on Stackoverflow. It's a pain to implement for multiple compilers but FMV in GCC makes it a lot easier. I'm surprised more Linux distros don't take advantage of FMV. Instead of recompiling the kernel for your own hardware (as with Gentoo) a disto could release a kernel which is optimized for a wide range of hardware (SSE2, SSE4.1, AVX, FMA,...). This also has the advantage that if you want to use your Linux installation on another system which does not support e.g. AVX you don't have to recompile.
http://stackoverflow.com/questions/1...11959#25911959

**indepe** · 15 December 2016, 03:39 PM

Originally posted by zboson View Post

From what I can tell so far Clear Linux's main contribution is testing where Function Multiversioning (FMV) shows the most improvement. Probably many other distros could benefit from this.

Function multi-versioning in GCC 6 [LWN.net]

https://lwn.net/Articles/691932/

https://clearlinux.org/features/func...versioning-fmv

I have discussed CPU dispatching often on Stackoverflow. It's a pain to implement for multiple compilers but FMV in GCC makes it a lot easier. I'm surprised more Linux distros don't take advantage of FMV. Instead of recompiling the kernel for your own hardware (as with Gentoo) a disto could release a kernel which is optimized for a wide range of hardware (SSE2, SSE4.1, AVX, FMA,...). This also has the advantage that if you want to use your Linux installation on another system which does not support e.g. AVX you don't have to recompile.
http://stackoverflow.com/questions/1...11959#25911959

That's a great feature, I'll be able to use it for a few often-used utility functions where I was expecting to have to do the CPUID/function-pointer thing manually.

It sounds especially valuable for general-purpose distros that need to run on a variety of hardware including older CPUs. As you say, it requires identifying the locations where it can be applied meaningfully. I've dug a bit into previous phoronix articles on the topic, and it seems Intel is all in favor of those spot optimizations going upstream, even if that requires additional work that Intel thinks is better done by the upstream developers.

I do think that performance, especially on desktops and servers, is much more important than distro maintainers appear to think, even if not everyone uses it all the time. Such optimizations also save power in the same degree, which perhaps needs mentioning since often the default for cpufreq is set to powersave rather than performance or on-demand, and even impossible to change in the GUI.

BTW, the execution time table in your first link also does a good job at highlighting how effective -O3 can be even for code that looks like the every-day kind of thing. Who wouldn't want such a saving on both performance and power consumption?

**zboson** · 16 December 2016, 04:55 AM

Originally posted by indepe View Post

BTW, the execution time table in your first link also does a good job at highlighting how effective -O3 can be even for code that looks like the every-day kind of thing. Who wouldn't want such a saving on both performance and power consumption?

Yeah, as far as I have seen -O2 is still used to compile the kernel for most distros. I would think at this point that many components of the kernel could be compiled with -O3 now. One of the main advantages of -O3 is that in turn on auto-vectorization (you can enable it with -O2 as well). You won't benefit much from FMV without auto-vecotrization unless you write hand tuned code for the SIMD instruction sets you want to target. Hand tuned code of course can likely preform much better but why not use the auto-vectorization we have now which only requires a compiler switch?

I'm not a kernel dev but I would like to find out more why -O3 is not the default by now. Most distros I think could benefit a lot from -O3 and FMV. If I knew how to make a distro that's the main feature I would be pushing for.

**indepe** · 16 December 2016, 07:26 AM

Originally posted by zboson View Post

Yeah, as far as I have seen -O2 is still used to compile the kernel for most distros. I would think at this point that many components of the kernel could be compiled with -O3 now. One of the main advantages of -O3 is that in turn on auto-vectorization (you can enable it with -O2 as well). You won't benefit much from FMV without auto-vecotrization unless you write hand tuned code for the SIMD instruction sets you want to target.

My own performance tests with a few often-used utility functions, as well as timing of many higher-level functions, show significant differences between -O2 and -O3, in many different areas. In the case of higher-level functions (or the tests on phoronix), I wouldn't know how much of that is due to auto-vectorization, however in the case of those utility functions it is mostly low-level optimizations and re-arranging of expressions, logic, register use, and loops within the common instruction set. However I can easily imagine that those are also strongly influenced by the CPU "arch". So, in any case, I would think that your point is correct that FMV helps the most when it is done on top of -O3.

Originally posted by zboson View Post

Hand tuned code of course can likely preform much better but why not use the auto-vectorization we have now which only requires a compiler switch?

Yep. Although, of course, some algorithms are designed around specific SSE4 instructions (for example), so alternatives, that do not use these instructions, need to be organized differently.

Originally posted by zboson View Post

I'm not a kernel dev but I would like to find out more why -O3 is not the default by now. Most distros I think could benefit a lot from -O3 and FMV. If I knew how to make a distro that's the main feature I would be pushing for.

So far I haven't read any good reason to dismiss -O3, other than concerns that could be resolved within a single development cycle, if distros started using -O3. My understanding is that Clear Linux uses -O3 (or some other set of more aggressive compiler settings), and that appears to work well. Are distros waiting for "-Odistro" ?

Do they want to avoid longer build times? Or maybe there was a reason in the past, and they didn't get around to changing things yet?

**arjan_intel** · 16 December 2016, 09:54 AM

Originally posted by indepe View Post

So far I haven't read any good reason to dismiss -O3, other than concerns that could be resolved within a single development cycle, if distros started using -O3. My understanding is that Clear Linux uses -O3 (or some other set of more aggressive compiler settings), and that appears to work well. Are distros waiting for "-Odistro" ?

Do they want to avoid longer build times? Or maybe there was a reason in the past, and they didn't get around to changing things yet?

O3 binaries are generally bigger than O2 binaries, and size has a (performance) cost as well on the aggregate. (bigger downloads, less effective use of the disk cache etc)

In Clear Linux, we have basically 3 modes, where the package owner picks one of the modes for his/her package: 1) Normal (O2), 2) Performance sensitive (O3) or 3) Size sensitive (Os).
The compiler flags I just listed are more simplistic than what actually is done (Size sensitive turns on function-sections etc), but basically we abstracted detailed compiler flags away behind a simple setting (to avoid wild west of custom flags).

**indepe** · 16 December 2016, 10:28 AM

Originally posted by arjan_intel View Post

O3 binaries are generally bigger than O2 binaries, and size has a (performance) cost as well on the aggregate. (bigger downloads, less effective use of the disk cache etc)

In Clear Linux, we have basically 3 modes, where the package owner picks one of the modes for his/her package: 1) Normal (O2), 2) Performance sensitive (O3) or 3) Size sensitive (Os).
The compiler flags I just listed are more simplistic than what actually is done (Size sensitive turns on function-sections etc), but basically we abstracted detailed compiler flags away behind a simple setting (to avoid wild west of custom flags).

Sounds very reasonable, especially considering the positive outcome.

I've seen many O3 optimizations that actually reduce size, or only effect it minimally, and are simply more efficient code, yet not part of O2 (perhaps because of the required build time). I guess that observation is already part of the choice of actual detailed compiler flags which you are using.

Announcement

Looking At GNU/Linux's Performance Over 2016 With Intel's Clear Linux

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment