GCC 6/7 Gets A Performance-Sensitive Fix
Written by Michael Larabel in Compiler on 9 May 2017 at 09:06 AM EDT. 5 Comments
A Phoronix reader pointed out a performance regression fix now available for GCC 6 and GCC 7 that could help some rather trivial C code perform much better.

A GCC bug was opened regarding its poor code generation of this bit of code that can be used in distance measurements and other areas:
x_x = (x * x) / 200;
y_y = (y * y) / 200;

The instructions generated by GCC came down to:
movl %edi, %r13d
imull %edi, %r13d
movl %r13d, %eax
sarl $31, %r13d
imull %ebx
sarl $6, %edx
movl %edx, %ecx
subl %r13d, %ecx

While Clang was generating a more efficient alternative:

movl %edx, %ebp
imull %ebp, %ebp
imulq $1374389535, %rbp, %rbp # imm = 0x51EB851F
shrq $38, %rbp

This case has now been fixed in GCC's code as of this week. The better-generated code is approximately 15% faster. Thanks to Sven for pointing it out and more details in this GCC bug report.

There will be fresh GCC/Clang Linux compiler benchmarks coming up on Phoronix in the next few weeks.
