Optimized memchr() Implementation For The Linux Kernel Up To ~4x Faster
The latest memchr() patches sent out this Sunday for the Linux kernel aim to speed-up this function that can be around ~4x faster for long strings. Yu-Jen Chang who sent out the patches explained:
The original version of memchr() is implemented with the byte-wise comparing technique, which does not fully use 64-bits or 32-bits registers in CPU. We use word-wide comparing so that 8 characters can be compared at the same time on CPU. This code is base on David Laight's implementation.
We create two files to measure the performance. The first file contains on average 10 characters ahead the target character. The second file contains at least 1000 characters ahead the target character. Our implementation of “memchr()” is slightly better in the first test and nearly 4x faster than the original implementation in the second test.
This may be nice for micro-benchmarking memchr() but in the grand scheme probably won't equate to any material benefits for end-users especially as the biggest performance gains will be with very long strings. Checking the Linux kernel source tree today, there are around 129 times memchr() (or ~323 for different variations of memchr) is used ranging from just the Linux kernel tools to being used within file-system code and various drivers.
The patches for those interested can be found on the Linux kernel mailing list for review.