Announcement

**jrch2k8** · 18 January 2019, 11:48 AM

Originally posted by skeevy420 View Post

I was thinking more on processor cache and less if/thens to detect the correct code to use. Disk space wasn't even a factor. I was thinking about people who want every last oomph from their hardware -- people like that don't want to waste cycles for the code to determine if it should use the AVX or AVX2 parts.

I'm not assuming FMV requires a new processor. Technically it can be for SSE and SSE2 or for various ARMs. Not positive, but I think the Westmere cutoff line is due to AES.

Which is why I used march=generic as the base and picked an mtune from newer CPUs like Skylake and Ryzen. It was just a quick example that would cover damn near everyone and include optimized code for newer processors. IMHO, if one compares CPU features it makes sense to lump similar x86_64 architectures together and to create some generational divides similar to i386, i486, i686, etc.

1.) libelf and Glibc are pratty damn smart here and detection process is done by the dynamic loader(that you have to run regardless) only once, is so fast that im not actually sure is even measurable to be honest.

2.) i cannot say 100% is that is not because of AES since i dont work for the clear linux team but i do use their kernel with ArchLinux(yeah baby) and it does have a ton of patches to use certain/enable features that are effectively westmere+

3.) this is a very bad idea, like baaaaad idea. The best use for FMV is to require the bare extensions because CPU architecture regardless of how similar have differences and you will end with a binary that will work with skylake on AVX2 but fallback on skylake and sigsegv on icelake for example because mtune picked a very unique extension outside AVX2(again for example, im not talking specifically about those 3 because is a fictional case).

Something like this happened to me on certain AMD families and AARCH64 CPU's and was a true full expense 2 weeks trip to the all the levels of dantes hell to debug it

by experience something like this is almost 100% safe to use on generic binaries(as distro usable)
__attribute__((target_clones("avx2","sse4.2","sse2 ","default"))) void my VecFunc(...){ //my code }

**Weasel** · 18 January 2019, 03:11 PM

Originally posted by skeevy420 View Post

I was thinking more on processor cache and less if/thens to detect the correct code to use.

That's only done on load, so "who cares" territory. BTW it's not truly like C++ polymorphism, since it's done at load time and then fixed by patching the code, whereas in C++ it's more indirect (so requires a predictor on every call, etc).

Announcement

Clear Linux's make-fmv-patch Eases The Creation Of GCC FMV-Enabled Code Paths

Comment

Comment