Originally posted by GreenReaper
View Post
The reason I think it's alright here is the benchmarks are moreso a sanity test/due-diligence that nothing is horribly wrong. the AVX and SSE4.2 code have the exact same logic and instruction order. The only difference is the AVX version uses VEX prefix on the SIMD instructions. The only issues can then be in the frontend. This can be fixed for the SSE4.2 code explicitly if someone wants to submit a patch. If the AVX code was faster it's really only by accident.
Intuitively I would also expect the SSE4.2 to be better for almost any application since its smaller code size will add less Icache pressure. This is hard to test as `str{n}casecmp` is not hot enough that we would notice these small percentage changes in any application. Probably the way to do it would be to write a toy application that had a bottleneck on `qsort` using `str{n}casecmp` as the comparison function. Didn't think it was needed given the exact match in logic and unfortunately don't own the hardware.
Leave a comment: