Glibc Picks Up Some More FMA Performance Optimizations

Written by Michael Larabel in GNU on 22 October 2017 at 12:41 PM EDT. 10 Comments
GNU --
The GNU C Library, glibc, has picked up support for some additional functions as FMA-optimized versions.

The newest functions now getting the fused multiply-add (FMA) support are powf(), logf(), exp2f(), and log2f(). The FMA instruction set is present since Intel Haswell and AMD Piledriver generations and like past FMA optimizations, the benefits can be quite noticeable.

The FMA written powf() function on Intel Skylake hardware is yielding a 29% improvement in reciprocal throughput and 24% lower latency for SPEC2017. The log2f() call meanwhile is seeing a 17% throughput improvement and 18% improvement in latency. The logf() function is seeing a 16% throughput improvement and 22% reduction in latency. Lastly, exp2f() is 16% faster and 18% improvement in latency.

These optimizations were done by H.J. Lu and are available via Git until the upcoming glibc 2.27 release.

H.J. Lu also made some improvements by replacing some Assembly versions of functions with generic code and found that it's yielded a performance improvement with the C code over the older Assembly code. Some of the performance improvements are even more profound that the FMA optimizations.
Related News
About The Author
Author picture

Michael Larabel is the principal author of and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via

Popular News This Week