Linux 5.0 To Linux 5.9 Kernel Benchmarking Was A Bumpy Ride With New Regressions
The Apache web server regression wasn't caused the I/O changes made in Linux 5.9... So it was back to bisecting on the 64-core AMD Linux server on Ubuntu 20.04.
That Apache regression was tracked down to a Linux 5.9 change by Linus Torvalds himself. He took up rewriting the wait_on_page_bit_common() logic. That patch was the first offending commit where the significant drop in the Apache web server performance was spotted.
Linus explained with that change, "It turns out that wait_on_page_bit_common() had several problems, ranging from just unfair behavioe due to re-queueing at the end of the wait queue when re-trying, and an outright bug that could result in missed wakeups (but probably never happened in practice). This rewrites the whole logic to avoid both issues, by simply moving the logic to check (and possibly take) the bit lock into the wakeup path instead. That makes everything much more straightforward, and means that we never need to re-queue the wait entry: if we get woken up, we'll be notified through WQ_FLAG_WOKEN, and the wait queue entry will have been removed, and everything will have been done for us."
Unfortunately though Apache seems to get wrecked off that change based on the bisecting. After the bisect, I then rebuilt the Linux 5.8 kernel while solely applying that patch (2a9127fcf2). Indeed, the Apache performance dives as the concurrent user / process count increases:
So it appears that the wait_on_page_bit_common patch is solely responsible for this active regression. Repeating on Linux Git master showed the lower Apache performance still present as of 8 September. Curiosity got to me whether this patch also caused the Redis slowdown in Linux 5.9, so I ran some tests there with Linux 5.8 and then the sole patch on top.
It appears the Redis regression on Linux 5.9 is also due at least in part to this patch from Linus. Given this active regression, I am now running more workloads now on Linux 5.8 vs. 5.9 to see if any other real-world slowdowns can be found from this change.
That's all for now given the time involved. If there is enough reader/interest support I may still end up exploring some of the older regressions found in this 5.0 to 5.9 kernel comparison like with TensorFlow-Lite performance receding several releases ago and still that way on 5.9 Git.
If you enjoyed this article consider joining Phoronix Premium to view this site ad-free, multi-page articles on a single page, and other benefits. PayPal or Stripe tips are also graciously accepted. Thanks for your support.