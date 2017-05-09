A Phoronix reader pointed out a performance regression fix now available for GCC 6 and GCC 7 that could help some rather trivial C code perform much better.
A GCC bug was opened regarding its poor code generation of this bit of code that can be used in distance measurements and other areas:
x_x = (x * x) / 200;
y_y = (y * y) / 200;
The instructions generated by GCC came down to:
movl %edi, %r13d
imull %edi, %r13d
movl %r13d, %eax
sarl $31, %r13d
imull %ebx
sarl $6, %edx
movl %edx, %ecx
subl %r13d, %ecx
While Clang was generating a more efficient alternative:
movl %edx, %ebp
imull %ebp, %ebp
imulq $1374389535, %rbp, %rbp # imm = 0x51EB851F
shrq $38, %rbp
This case has now been fixed in GCC's code as of this week. The better-generated code is approximately 15% faster. Thanks to Sven for pointing it out and more details in this GCC bug report.
There will be fresh GCC/Clang Linux compiler benchmarks coming up on Phoronix in the next few weeks.
