Announcement

**hernil** · 17 January 2021, 07:14 PM

Are there any plans for any sort of CI for the kernel? It seems these kinds of things happen frequently enough that running some automated benchmarks (like you do, Michael ;-) ) before tagging releases would be reasonable. Surely any one of the big cloud players could easily donate the compute power for this through the Linux foundation, and scaffolding the whole thing should not demand that much considering how many tools exist for this these days. Could have a lot of neat features like bisecting to find the culpable commit. Or even the reverse, highlight commits with noticeable performance improvements.

**geearf** · 17 January 2021, 08:15 PM

Originally posted by hernil View Post

Are there any plans for any sort of CI for the kernel? It seems these kinds of things happen frequently enough that running some automated benchmarks (like you do, Michael ;-) ) before tagging releases would be reasonable. Surely any one of the big cloud players could easily donate the compute power for this through the Linux foundation, and scaffolding the whole thing should not demand that much considering how many tools exist for this these days. Could have a lot of neat features like bisecting to find the culpable commit. Or even the reverse, highlight commits with noticeable performance improvements.

I agree, it's astonishing that a lot of theses things aren't tested properly first, before they hit mainstream.

**piorunz** · 17 January 2021, 09:16 PM

Originally posted by geearf View Post

I agree, it's astonishing that a lot of theses things aren't tested properly first, before they hit mainstream.

5.10 is not yet LTS. If you want things very stable, use LTS (5.4). Once 5.10 settles and 5.11 is released, most of the bugs in 5.10 will be fixed and you can safely switch.
I wouldn't moan about quality and suggest slowing down innovation when you can shield yourself from bugs that easily

**RichieFrame** · 17 January 2021, 09:56 PM

No other pressing regressions for this LTS kernel come to mind at this point.

That is not the case, there is a very VERY bad regression with the Intel WiFi driver. Here is the open bug report:

209913 – Low upload speed

https://bugzilla.kernel.org/show_bug.cgi?id=209913

Here is an example of just how bad this bug makes access to an NFS share over WiFi:

Code:

richie@gram: ~ $ nfsiostat

192.168.0.3:/mnt/raid6/Video mounted on /mnt/raid6/Video:

ops/s   rpc bklog
2.613   0.000

read:         ops/s      kB/s      kB/op      retrans      avg RTT (ms)      avg exe (ms)
              0.275      88.816    323.445    0 (0.0%)     18.515            18.667
write:        ops/s      kB/s      kB/op      retrans      avg RTT (ms)      avg exe (ms)
              0.300      306.794   1023.436   0 (0.0%)     3214.208          271796.795

Look at the RTT and execution times, write RTT is 173 times longer, which is what the performance drop feels like, something that should take 5 seconds takes 15 minutes

**geearf** · 17 January 2021, 10:04 PM

Originally posted by piorunz View Post

5.10 is not yet LTS. If you want things very stable, use LTS (5.4). Once 5.10 settles and 5.11 is released, most of the bugs in 5.10 will be fixed and you can safely switch.
I wouldn't moan about quality and suggest slowing down innovation when you can shield yourself from bugs that easily

Running local tests/CI before pushing bad performant commit has nothing do with slowing down innovation.
And based on what Michael wrote, it's not like this regression needed a very specific system to find:

For a simple test case like extracting a large .tar.zst file could go from taking just around 15 seconds to nearly five minutes or in other cases like from 5 seconds to over 30 seconds.

At any job I've had I'd be quite in trouble declaring code acceptable and then having people finding that sort of result.

**mroche** · 17 January 2021, 10:31 PM

Originally posted by piorunz View Post

I wouldn't moan about quality and suggest slowing down innovation when you can shield yourself from bugs that easily

Having a properly structured, automated, functioning CI and testing infrastructure, or at least some kind of local testing procedure doesn’t slow down innovation, nor are either of the preceding comments calling for that. I’m pretty sure most end users would prefer features new and old get tested during development before release, not after. Having basic checks and/or properly testing != slowing innovation. There are weeks of release candidate releases that can be checked, not to mention the subsystem trees to check if anything wonky arises before merging patches.

For something as critical as a kernel, there should be confidence in X.0 releases being ready for use. Security patches, and small bug fixes sure. But not serious performance regressions.

Cheers,
Mike

**geearf** · 17 January 2021, 11:40 PM

Originally posted by mroche View Post

There are weeks of release candidate releases that can be checked, not to mention the subsystem trees to check if anything wonky arises before merging patches.

I strongly think testing at the RC is already way too late for performance (but of course better than nothing).
Why? Simply because one enhancement may negate a regression, but maybe you could have done better (maybe not of course).

Originally posted by mroche View Post

For something as critical as a kernel, there should be confidence in X.0 releases being ready for use. Security patches, and small bug fixes sure. But not serious performance regressions.

They could always go and say don't have confidence in .0 but starting with .1 and that'd be fine too I think. Of course that'd be more intended for weirder bugs that only happen in certain configuration.
That's what I've been doing lately, I now wait for .1 or .2 before moving on (well unless there's a new feature I really want, but that's rare).

**mroche** · 18 January 2021, 12:39 AM

Originally posted by geearf View Post

I strongly think testing at the RC is already way too late for performance (but of course better than nothing).
Why? Simply because one enhancement may negate a regression, but maybe you could have done better (maybe not of course).

I absolutely agree. To me it's a worst case suggestion, not an actual goal. Preferably the testing would be much more extensive/comprehensive than just RCs. I had lost my post several times (I don't recommend writing posts on an aging phone even with Phoronix caching), I thought I still had the "At the very least ..." from my first draft.

Cheers,
Mike

**piorunz** · 18 January 2021, 01:21 AM

Guys you realize these regressions were invisible on testing environment & VM but visible only once hit bare metal machines only? Its been explained in one of previous posts. Testing is being done, but it's not critical software, it's not Bitcoin Core or anything.
Stable things are meant to run on LTS, battle tested kernel. Everything else is due to experience normal development process.
I am thankful we have Linux kernel, even if I spent 10 of my entire lifetimes I wouldn't come even close to design and code something like Btrfs. If it has bugs, I'll wait. It's worth it.

Announcement

Linux 5.10.8 Kernel Released - Finally Fixes That Btrfs Performance Regression

Linux 5.10.8 Kernel Released - Finally Fixes That Btrfs Performance Regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment