Announcement

**milkylainen** · 27 August 2020, 04:59 AM

Overdue...

**CochainComplex** · 27 August 2020, 08:00 AM

why hasn't it been built in already together with march? this would be my naive approach implementing this features

**KaoDome** · 27 August 2020, 12:19 PM

Could anyone tell me what the practical implications of this are?

To be honest, I barely understand what -mtune does in GCC if at all, it is my understanding that -march sets the supported instructions to use when generating code, and when native is specified there even cache sizes are taken into account; but what does the other option tune?

Is it meaningful? I mean, performance wide or otherwise, does it play an important part in creating more efficient binaries maybe?

**airminer** · 27 August 2020, 05:45 PM

Originally posted by KaoDome View Post

Could anyone tell me what the practical implications of this are?

To be honest, I barely understand what -mtune does in GCC if at all, it is my understanding that -march sets the supported instructions to use when generating code, and when native is specified there even cache sizes are taken into account; but what does the other option tune?

Is it meaningful? I mean, performance wide or otherwise, does it play an important part in creating more efficient binaries maybe?

CPUs with different microarchitectures may support the same instructions, but different instructions perfrom better on different microarchitectures. For example some instructions may be microcoded (and thus slower) on a small core, while they are implemented in silicon on big cores. Other instructions that are commonly microcoded are rarely used, old instructions included for compatibility's sake. In these cases, it may be faster to execute one or more different, in-silicon instructions that accomplish the same goal as the microcoded instruction. For this reason, GCC includes cost tables for each CPU it can tune for, that describe how fast all of the supported instructions are.

Basically, -march sets the set of supported instructions the compiler is allowed to use, and -mtune sets the cost tables that the compiler should use to determine which of the supported instructions perform best - and thus which should be preferred by the compiler.

**Guest** · 28 August 2020, 02:11 AM

Nice! Maybe I can finally use it as a replacement for GCC for optimising binaries for my computer's.

**jabl** · 28 August 2020, 03:29 AM

Originally posted by airminer View Post

CPUs with different microarchitectures may support the same instructions, but different instructions perfrom better on different microarchitectures. For example some instructions may be microcoded (and thus slower) on a small core, while they are implemented in silicon on big cores. Other instructions that are commonly microcoded are rarely used, old instructions included for compatibility's sake. In these cases, it may be faster to execute one or more different, in-silicon instructions that accomplish the same goal as the microcoded instruction. For this reason, GCC includes cost tables for each CPU it can tune for, that describe how fast all of the supported instructions are.

Basically, -march sets the set of supported instructions the compiler is allowed to use, and -mtune sets the cost tables that the compiler should use to determine which of the supported instructions perform best - and thus which should be preferred by the compiler.

Yes, pretty much. As a minor nit, -march=foo also implies -mtune=foo. The usefulness of being able to specify -mtune= separately (which overrides the implicit -mtune= specified by -march=) is that when producing binaries for deployment on multiple different computers (e.g. a linux distro), one can set -march= to the lowest common denominator one wants to support, and then with -mtune= specify a somewhat newer and more common cpu model.

**jabl** · 28 August 2020, 03:34 AM

Originally posted by KaoDome View Post

Could anyone tell me what the practical implications of this are?

To be honest, I barely understand what -mtune does in GCC if at all, it is my understanding that -march sets the supported instructions to use when generating code, and when native is specified there even cache sizes are taken into account; but what does the other option tune?

What -march=native does is that instead of the user specifying the CPU family on the command line, the compiler will check the cpu model (the CPUID instruction on x86), and automatically select the CPU family based on that.

**carewolf** · 28 August 2020, 06:30 AM

Originally posted by jabl View Post

Yes, pretty much. As a minor nit, -march=foo also implies -mtune=foo. The usefulness of being able to specify -mtune= separately (which overrides the implicit -mtune= specified by -march=) is that when producing binaries for deployment on multiple different computers (e.g. a linux distro), one can set -march= to the lowest common denominator one wants to support, and then with -mtune= specify a somewhat newer and more common cpu model.

Yeah, you might want to optimize for haswell features, but not have the tuning work-arounds for the inefficiencies/brokeness of haswell. So using either -mtune=generic or -mtune=skylake.

**KaoDome** · 28 August 2020, 10:35 AM

Thanks airminer and jabl, I understand the implications of mtune now, it's good news that Clang devs are looking at honoring that switch in a newer release.

Announcement

LLVM Clang Will Finally Honor "-mtune=" On x86/x86_64 CPUs

LLVM Clang Will Finally Honor "-mtune=" On x86/x86_64 CPUs

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment