Announcement

**tomas** · 26 August 2021, 05:26 AM

Originally posted by Volta View Post

A huge user base relying on stupidly broken Android libraries.

Yes, and your point is?
Are those libraries broken in a way that have security implications? Not that I heard of. So why is i reasonable that they should suddenly break when running against a newer kernel?
Besides, "stupidly broken" is a subjective claim. They relied on some subtle behavior of the library that was not part of the documented behavior. Sure, you can always claim that "it's their fault" and that they are responsible for updating their libraries to work on a newer kernel. However, that is not a pragmatic approach and thankfully Linus also realizes this. Besides, what happens if you have applications using the library that also relies on the old behavior in some subtle way? Then you are in a "lock step" situation where you have to maintain 2 versions of your application, one that works for the older kernel and one that works against the newer kernel.

**Volta** · 26 August 2021, 06:02 AM

Originally posted by tomas View Post

Yes, and your point is?
Are those libraries broken in a way that have security implications? Not that I heard of. So why is i reasonable that they should suddenly break when running against a newer kernel?

There wasn't any specific point except huge Android user base has less meaning in this case, because Android isn't the most prestigious market for Linux. Everyone will be able to move to fixed system call ver. 2. Nobody needs to break anything, so you're right of course.

**skeevy420** · 26 August 2021, 08:29 AM

Originally posted by tomas View Post

The code base yes. The syscall ABI no, it's a contract that should not be broken. Applications that used to work should continue to work, unless they rely on some security flaw of a syscall.

Except that those "stupid Android libraries" constitute a HUGE user base. Much bigger than for example desktop Linux.

Yes, of course. By using a new version of said syscall, transitioning at their own pace, at a point in time of their choice. Perhaps some of these "stupid Android libraries" will never change, and that is fine too. It's not like the old behavior was problematic from a security perspective or severe in some other way. Its behavior was non-optimal but that is hardly a reason for breaking backwards-compatibility.

Besides, do you know of any other operating system that does not take the same approach as Linux? From what I know both the BSDs and other proprietary operating systems take a similar stance when it comes to not breaking user space.

The problem with the Android example is the phones effected will likely never actually update the kernel to receive this update to break their userspace. Android very, very rarely updates the major kernel version...with some devices still on (heavily modified) 3.X. Why hold back a fix because a platform that will never get updated has a regression? Because they don't want to update their userspace before shipping the next product? They'll likely have to do more work to fix their code when GCC/LLVM updates.

Also, IMHO, Android should be the most important reason to fix this. Unoptimized kernel behavior on a battery powered platform. I shouldn't have to go into that in any detail whatsoever.

I don't get the problem. If you're using an unintended, undocumented behavior and it gets fixed then you should update your stuff accordingly. If the kernel was programmed wrong to have an unintended, undocumented behavior, that gets fixed, and that breaks something userspace then userspace should be updated accordingly. How is that any different than a GCC fix that requires cat to have some fixes to compile? How is that any different than a bug fix in Qt5 that makes downstream users have to update their code?

This is what happens when following protocol rigidly and inflexibly. If you can't fix something that was done wrong to begin with you should reevaluate your protocols.

IMHO, I wish this was a flag on boot or something to be set at compile time. Let those with 15 year old docker images use "pipe=legacy" if they really need the old way.

**tomas** · 26 August 2021, 09:11 AM

Originally posted by skeevy420 View Post

Why hold back a fix...

Nothing is being held back. A new version of the syscall with a more optimal behavior is being made available. New versions of libraries that wish to use the new more optimal syscall with slightly changed behavior will be released. All is well.

If you can't fix something that was done wrong..

This is the problem right here, definition of "fixed" and "wrong". This is not about fixing a bug in the syscall implementation that would cause a kernel panic or lead to a security issue.
This is about changing the behavior of an existing syscall in such a way that existing users of said syscall break. It does not matter if the behavior was undocumented or "whose fault it is". If you can optimize a syscall implementation without changing ANY externally visible behavior, then that is fine and I'm sure it has been done on many occasions in the kernel. But you do not break existing users of a sys call unless you really really have to, for example due to security reasons. Linus understands this, as do the developers of other operating systems. I ask again, can anyone point to other operating systems which have a different policy when it comes to keeping backwards compatibility for sys calls?

EDIT:

"How is that any different than a GCC fix that requires cat to have some fixes to compile?"

That is vastly different, it's like night and day.
One is about changing an ABI causing existing binaries (libraries and/or programs) to break. The other one is about requiring source code changes to use a new compiler. The difference in impact between the two examples are that the first one potentially effects end users while the second one only affects developers or users that compile from sources with a compiler that is yet to be supported for building the library or program.

**F.Ultra** · 26 August 2021, 10:57 AM

Originally posted by milkylainen View Post

Don't agree. There have to be borders for every stupid call. You don't go throw yourself off a cliff because "everybody else does it".
This is a living codebase. If literally NOBODY beside stupid Android libraries cared for several years, then it's not an issue.
And it's not like they can't fix the behavior into something sane?

This is just Linus being compelled to following his own rules because he told people off so many times,
he can't go breaking the rules himself.

Not really in this particular case though. See the change to how pipes worked back in v5.5 while not technically breaking the promised EPOLL behaviour, did make pipes a special case that suddenly worked completely different from everything else that you can monitor with EPOLL. Linus:s change in v5.14 changes pipes to behave like everything else again which not only solves any possible userspace code dependent on the old behaviour but also makes EPOLL behave consistent across different types of descriptors.

**yump** · 27 August 2021, 02:52 AM

Originally posted by skeevy420 View Post

I don't get the problem. If you're using an unintended, undocumented behavior and it gets fixed then you should update your stuff accordingly. If the kernel was programmed wrong to have an unintended, undocumented behavior, that gets fixed, and that breaks something userspace then userspace should be updated accordingly. How is that any different than a GCC fix that requires cat to have some fixes to compile? How is that any different than a bug fix in Qt5 that makes downstream users have to update their code?

This philosophy makes it impossible for software to ever be finished. If there is continuous churn in underlying libraries and interfaces that requires ongoing maintenance, the only way to have an ever-growing library of useful software is to have an ever-growing army of maintainers.

**arQon** · 27 August 2021, 05:22 AM

Originally posted by bachchain View Post

I can just imagine a few years from now someone trying to do a rewrite of epoll that makes it 10x faster, but being rejected because it breaks this one buggy version of this one library that nobody uses anymore

And that's still the right way to deal with it. As others have pointed out, we already have -Ex or -2 etc versions of a TON of syscalls specifically for breaking changes.

This naive mentality of "but everybody should just fix all the code ever written" isn't just naive: it's a shitty, narcissistic way of operating that makes *everybody else* responsible for *your* mistakes.

To take a simple example that might be more familiar to you, consider Gnome Shell. Every release, a bunch of extensions break. (IDK how large the number is each time, but it's enough that people here have been complaining about it constantly for years). The users aren't at fault, but they suffer for it, because now something they use doesn't work any more. The extension developers aren't at fault either, but they suffer even more as they have to deal with both the breakage and the users complaining. Now imagine that instead of some toy widget on a desktop, the impact is thousands of programs running on millions of servers.

Alternatively, try thinking of the cumulative impact of having to constantly keep doing those repairs. Instead of working on the "real" parts of your project, you're always wasting time dealing with the fallout of *someone else's* bugs, design failures, and other mistakes. Over time, you either get tired of cleaning up their mess and walk away by choice, or the productivity loss means you're forced to.

If your "One True Kernel" ever happened, it would make Hurd look popular by comparison - which is probably not the end result you were hoping for.

**skeevy420** · 27 August 2021, 07:50 AM

Originally posted by tomas View Post

Nothing is being held back. A new version of the syscall with a more optimal behavior is being made available. New versions of libraries that wish to use the new more optimal syscall with slightly changed behavior will be released. All is well.

Kind of mixed up my day one rant and now

This is the problem right here, definition of "fixed" and "wrong". This is not about fixing a bug in the syscall implementation that would cause a kernel panic or lead to a security issue.
This is about changing the behavior of an existing syscall in such a way that existing users of said syscall break. It does not matter if the behavior was undocumented or "whose fault it is". If you can optimize a syscall implementation without changing ANY externally visible behavior, then that is fine and I'm sure it has been done on many occasions in the kernel. But you do not break existing users of a sys call unless you really really have to, for example due to security reasons. Linus understands this, as do the developers of other operating systems. I ask again, can anyone point to other operating systems which have a different policy when it comes to keeping backwards compatibility for sys calls?

That's what makes this such a grey area. Working but incorrect. Normally we'd do out with the old, in with the new (just humans in general), but protocol makes it so it's keep the old, in with the new. I don't think it should necessarily be that way....maybe keep it until 6.0 and drop it.

EDIT:

That is vastly different, it's like night and day.
One is about changing an ABI causing existing binaries (libraries and/or programs) to break. The other one is about requiring source code changes to use a new compiler. The difference in impact between the two examples are that the first one potentially effects end users while the second one only affects developers or users that compile from sources with a compiler that is yet to be supported for building the library or program.

I'm aware of the differences, just seems a bit odd to me that all the rest of the Linux software stack is allowed to "break" each other while the kernel isn't allowed to even if it is to fix bad code and increase performance. I still think a compromise of a grace period, like when 6.0 comes out, would be a good way for Linux to "break" userspace. If software can't have major changes between major versions then y'all need to come up with a new way of doing things.

It just seems dumb that we can drop IDE and floppy support entirely but we can't fix pipe. The worst part is the broken userspace code has been fixed....so we're debating about leaving in a bad implementation in the kernel even though the userspace side that the kernel fix broke has been fixed.

Originally posted by yump View Post

This philosophy makes it impossible for software to ever be finished. If there is continuous churn in underlying libraries and interfaces that requires ongoing maintenance, the only way to have an ever-growing library of useful software is to have an ever-growing army of maintainers.

Which is exactly how it works now.

GTK/Qt updated and all the GUI programs adapt.
GCC/LLVM updated and all the programs adapt.
PulseAudio starts to be favored over ALSA and all the programs adapt.
Python updated and all the scripts adapt.
A physics library updates and all the game engines adapt.
X updates and Y related things adapt.

There's a reason that IBM Hat can push what they want. They pay the ever-growing army of maintainers.
There's a reason that Ubuntu stands out from the crowd. They pay an ever-growing army of maintainers.
There's a reason that Arch and Gentoo are so popular with the community. Because we're expected to be the ever-growing army of maintainers.

**NobodyXu** · 27 August 2021, 08:19 AM

Originally posted by skeevy420 View Post

Kind of mixed up my day one rant and now

That's what makes this such a grey area. Working but incorrect. Normally we'd do out with the old, in with the new (just humans in general), but protocol makes it so it's keep the old, in with the new. I don't think it should necessarily be that way....maybe keep it until 6.0 and drop it.

I'm aware of the differences, just seems a bit odd to me that all the rest of the Linux software stack is allowed to "break" each other while the kernel isn't allowed to even if it is to fix bad code and increase performance. I still think a compromise of a grace period, like when 6.0 comes out, would be a good way for Linux to "break" userspace. If software can't have major changes between major versions then y'all need to come up with a new way of doing things.

It just seems dumb that we can drop IDE and floppy support entirely but we can't fix pipe. The worst part is the broken userspace code has been fixed....so we're debating about leaving in a bad implementation in the kernel even though the userspace side that the kernel fix broke has been fixed.

Which is exactly how it works now.

GTK/Qt updated and all the GUI programs adapt.
GCC/LLVM updated and all the programs adapt.
PulseAudio starts to be favored over ALSA and all the programs adapt.
Python updated and all the scripts adapt.
A physics library updates and all the game engines adapt.
X updates and Y related things adapt.

There's a reason that IBM Hat can push what they want. They pay the ever-growing army of maintainers.
There's a reason that Ubuntu stands out from the crowd. They pay an ever-growing army of maintainers.
There's a reason that Arch and Gentoo are so popular with the community. Because we're expected to be the ever-growing army of maintainers.

GTK/Qt also don’t break backward compatibility unless necessary, they only break it on major version update, and every major update take a long time.
For example, Qt 6 has released for a long time, but due to the fact that it changes so many APIs and some functionalities are missing, many softwares is still using Qt 5.

Changes to GCC/LLVM command line interface is so minor during major update that most programmer won’t notice, and for those who need to cope with it, there are build system like cmake which enable using flags only when the compiler supported it.

BTW, cmake is also versioned.
So in order to enable LTO in cmake, you need to first assert the minimal version of cmake.

Updates to python also only adds new APIs and the only breaking change is to upgrade from python2 to python3, and it’s a pain.
Even now, there are many softwares out there still supporting python2 and some only supports python2.

Even big companies like Red Hat and Ubuntu needs to maintain backward compatibility.

Red Hat used to have 10 years free support for CentOS. Why? Because they don’t want to break anything.
This year, they break the convention and suddenly announces EOL for CentOS 8 for their new product line CentOS Stream.
Guess what? People setup their own linux distribution like Rocky Linux.

Ubuntu and debian is also the same.
Debian guarantees at least 5 years of support.
Ubuntu LTS are the same (they usually release after debian), and they also provide an extended 10 year support services.

You think industry like frequently updating to newest changes that break their implementation so that they can spend money rewrite them?
Hell no.

There’s a reason why rolling distribution like Arch or Gentoo is never used in server environment.

Almost every docker (container) image uses debian/ubuntu/centos/alpine (also do point release).

Nobody wants their softwares to suddenly break because of an unanticipated update.

AND, BTW, GENTOO IS CALLING FOR MORE MAINTAINERS.

**skeevy420** · 27 August 2021, 08:46 AM

NobodyXu

You're unintentionally confirming what I said with your argument -- both that the kernel could use a way to break itself during an anticipated update and that if you don't pay for maintainers the community is expected to step up and become maintainers.

You basically listed example after example of why anticipate updates with breaks/changes are necessary. Even gave an example of a build system to cope with all the versions and changes over the years and decades...and gave a better name than grace period. There is no reason the kernel can't do the same with a major version change. Turn the last 5.X into ELTS and let 6.0 drop any fixes, duplicated efforts, etc because of things like this (not like I have a running tally here).

While potentially a pain, I can't think of another way for the kernel to truly advance itself...maybe only doing that once every 10 or 20 years to reduce the burden.

Announcement

Linux Pipe Code Again Sees Patch To Restore Buggy/Improper User-Space Behavior

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment