Linux 4.17 Kernel Patch Brings -march=native Support
A Gentoo user has revised his kernel patch allowing the mainline Linux kernel to be built with the GCC "-march=native" compiler optimizations for targeting the kernel build against your particular CPU.
While -march=native of modern compilers is popular with developers/enthusiasts for building optimized packages targeting your specific CPU micro-architecture, the mainline Linux kernel still does not support this functionality. But Gentoo user Alexey Dobriyan.
The 400+ line patch dubbed 4.17-ad1 allows compiling the kernel with the "-march=native" compiler flag and for it to be honored.
Those interested can find the patch here though it's only received minimal testing on Intel hardware and still has some open TODO items.
The developer hasn't supplied any benchmarks though on his original version of the patch he noted, "Random microbenchmarking indicates that a) SHLX et al enabled SHA-1 can be ~10% faster than regular one as there are no carry flags dependencies and b) REP STOSB clear_page() can be ~15% faster then REP STOSQ one where fast REP STOSB is advertised. This is actually important because clear_page()/copy_page() are regularly seen on top of kernel profiles."
When time allows I'll probably give the kernel build a whirl and see how it does in some real-world Linux benchmarks.
While -march=native of modern compilers is popular with developers/enthusiasts for building optimized packages targeting your specific CPU micro-architecture, the mainline Linux kernel still does not support this functionality. But Gentoo user Alexey Dobriyan.
The 400+ line patch dubbed 4.17-ad1 allows compiling the kernel with the "-march=native" compiler flag and for it to be honored.
Those interested can find the patch here though it's only received minimal testing on Intel hardware and still has some open TODO items.
The developer hasn't supplied any benchmarks though on his original version of the patch he noted, "Random microbenchmarking indicates that a) SHLX et al enabled SHA-1 can be ~10% faster than regular one as there are no carry flags dependencies and b) REP STOSB clear_page() can be ~15% faster then REP STOSQ one where fast REP STOSB is advertised. This is actually important because clear_page()/copy_page() are regularly seen on top of kernel profiles."
When time allows I'll probably give the kernel build a whirl and see how it does in some real-world Linux benchmarks.
16 Comments