Linux Changes Pipe Behavior After Breaking Problematic Android Apps On Recent Kernels

F.Ultra replied

01 August 2021, 07:26 PM
Originally posted by indepe View Post

If "waking readers" refers to callers of epoll_wait(), then thanks for the clarification, however looking at the documentation of epoll, I would (still) expect the previous and now reverted-to behavior. Or at least consider it more likely, and then confirm by testing it. Of course, I may be missing something.

This is the documentation I am looking at, dated "2021-03-22": https://man7.org/linux/man-pages/man7/epoll.7.html

Specifically this sentence:

I would expect "changes occur on the monitored file descriptor" to include any write, not just writes on an empty file or pipe.

The doc does mention reading-until-EAGAIN as a "suggested way" of use, but does not say that this is required for correct functioning. It even starts a sentence like this:

and it sounds like the only case when that doesn't happen is if the EPOLLONESHOT option is used. (Though after reading the latest commit comment, I get the impression that "chunk of data" might refer to something different than I would espect.)

So my perhaps naive interpretation is that the original behavior (now re-created) was the one that also better corresponds to the documentation, and that changing it should require a change of documentation. Also, I don't see why it would be obviously non-sensical, given the variety of possible use cases.

The text that you quoted from the epoll man just states that you are not guaranteed to get just a single event even if edge-triggered under special circumstances. I.e lets say that you receive 2k of data but only do one read of 1k, you now have 1k left but instead of calling read() again until EAGAIN you call epoll_wait, since you are edge triggered you will now not receive a new event for the remaining 1k of data and your application might be stuck waiting for an event that will never happen (this is the bug in the Android library). However if you receive 1k more data then epoll_wait will now return again, and this is what the text in the man page describes a bit badly.

edit: Ok so I simplified the text above a bit too much, when you receive 1K more data you will normally not get a new edge-trigger from epoll. If this new 1K however means that the kernel read buffer for your descriptor is exhausted, then this will lead to a second edge-trigger.

edit2: This can also happen if you use the same epoll fd in more than one thread. You get 1K of data, so epoll_wait wakes up thread A who does a read (10240), this exhausts the kernel read buffer since you read all of the 1K that was coming in, but now you get 1K more data so epoll_wait wakes up thread B, but now thread A calls read again to see if there where more data originally or if it will receive EAGAIN while thread B also does a read to get the data from what was triggered, now it's random which one of the two that will get the new 1K and which one will get the EAGAIN. And this is the reason behind the EPOLLONESHOT flag since that tells epoil to only wake thread A, after tread A have read EAGAIN it rearms the fd and first after this epoll can wake up A or B when new data arrives.

Last edited by F.Ultra; 01 August 2021, 07:44 PM.
Leave a comment:
indepe replied

01 August 2021, 04:12 PM
Originally posted by F.Ultra View Post

Well if the pipe isn't empty then there are no blocked readers. I.e read() on such a pipe would not block, but the realm library in particular didn't read from the pipe until EAGAIN when it got edge triggered by EPOLL that there where data and then expected to get a new edge trigger again when some one did a write to the pipe even though it wasn't cleared in between.

If "waking readers" refers to callers of epoll_wait(), then thanks for the clarification, however looking at the documentation of epoll, I would (still) expect the previous and now reverted-to behavior. Or at least consider it more likely, and then confirm by testing it. Of course, I may be missing something.

This is the documentation I am looking at, dated "2021-03-22": https://man7.org/linux/man-pages/man7/epoll.7.html

Specifically this sentence:

The reason for this is that edge-triggered mode delivers events only when changes occur on the monitored file descriptor.

I would expect "changes occur on the monitored file descriptor" to include any write, not just writes on an empty file or pipe.

The doc does mention reading-until-EAGAIN as a "suggested way" of use, but does not say that this is required for correct functioning. It even starts a sentence like this:

Since even with edge-triggered epoll, multiple events can be generated upon receipt of multiple chunks of data,

and it sounds like the only case when that doesn't happen is if the EPOLLONESHOT option is used. (Though after reading the latest commit comment, I get the impression that "chunk of data" might refer to something different than I would espect.)

So my perhaps naive interpretation is that the original behavior (now re-created) was the one that also better corresponds to the documentation, and that changing it should require a change of documentation. Also, I don't see why it would be obviously non-sensical, given the variety of possible use cases.

Last edited by indepe; 01 August 2021, 04:14 PM.
Leave a comment:
F.Ultra replied

01 August 2021, 09:46 AM
Originally posted by indepe View Post

As far as I understand, only a small part of the original improvement was reverted (a probably minor optimization that avoids checking for readers).

Without knowing the API or its use, it seems the strange thing is that the API allows blocked readers even when the pipe isn't empty anymore. However as long as the API allows that situation, then what else should wake those readers? So I guess the correct behavior is to wake blocked readers on any write. And I guess an application which doesn't want that behavior would not have any blocked readers waiting for a non-empty pipe, in the first place.

Again, I am saying this without knowing the API or its use, just from the description here and in the links. However if I understood it correctly, this partial revert shouldn't be a problem.

Well if the pipe isn't empty then there are no blocked readers. I.e read() on such a pipe would not block, but the realm library in particular didn't read from the pipe until EAGAIN when it got edge triggered by EPOLL that there where data and then expected to get a new edge trigger again when some one did a write to the pipe even though it wasn't cleared in between.
Leave a comment:
andreano replied

01 August 2021, 02:43 AM
Originally posted by ed31337 View Post

Microsoft would constantly change things (…) drop, deprecate, etc

Anecdotally, they are also trying ridiculously hard not to break userspace, to the point where it hurts their API:

How Microsoft Lost the API War

https://www.joelonsoftware.com/2004/06/13/how-microsoft-lost-the-api-war/

Here’s a theory you hear a lot these days: “Microsoft is finished. As soon as Linux makes some inroads on the desktop and web applications replace desktop applications, the mighty empir…

That's an old post though, so I may be outdated on that.
Likes 3
Leave a comment:
ed31337 replied

01 August 2021, 01:09 AM
This is fantastic. One of the big problems with developing on Windows was that Microsoft would constantly change things in subtle ways and mysteriously break stuff people downstream are using. Then there were all the things that were overtly dropped, deprecated, etc (a problem Google commits so often too).

Linux is so much better to work with because code written today will continue working for a long, long time thanks to Linus's excellent stewardship.
Likes 4
Leave a comment:
curfew replied

31 July 2021, 11:40 PM
To my understanding Android phones are usually more or less locked into a specific kernel version (or two) even between major Android upgrades? So this looming kernel upgrade was never going to break any (apps on) existing phones anyway? Given that Linux is on its way out soon anyway, Google is ramping up with its Fuchsia OS, why would you bend over backwards to avoid upsetting some crappy software vendors on a platform that's barely yours anyway?

What upsets me is that it reeks of submission to commercialism. What happens when KDE or Gnome devs come and say that we did something stupid, can you lock your kernel to this ridiculous setting until the end of time because we are busy doing more fun things? Now something that was previously undocumented is suddenly a documented, officially-sanctioned feature, and they will have to honour that commitment forever.
Likes 2
Leave a comment:
curfew replied

31 July 2021, 11:33 PM
Originally posted by sophisticles View Post

There was certain functionality built into the API and libraries where coded to use it.

How is this the applications' or libraries' fault?

The documentation (supposedly) clearly stated that a signal has purpose X, but a bunch of developers used it for another purpose Y, because by consequence it seemed to work how they wanted it to work. But then the code changed, still remaining fully compatible with the intended/documented/allowed use cases, but it became useless for the purpose Y.

Nowhere was there ever any "permission" to do Y with this signal so there is also no permission to whine that is has been broken. Usually when you do stuff that goes against the intention of the said feature, then it's your problem if the feature changes in a way that doesn't allow for your crazy monkey business anymore.

Last edited by curfew; 31 July 2021, 11:41 PM.
Likes 2
Leave a comment:
bachchain replied

31 July 2021, 11:26 PM
Originally posted by sophisticles View Post

How is this the applications' or libraries' fault?

Because it was undefined behavior, and relying on undefined behavior is wrong.
Likes 3
Leave a comment:
sophisticles replied

31 July 2021, 09:21 PM
Can someone explain to me what this means:

some Android libraries abused the functionality

There was certain functionality built into the API and libraries where coded to use it.

How is this the applications' or libraries' fault?
Likes 1
Leave a comment:
Charlie68 replied

31 July 2021, 06:17 PM
Originally posted by xeekei View Post

Wait, so technically anyone who has found a bug that needs to be adressed, can write some dumbass code that uses it, and then effectively hold the entire Linux kernel hostage?
Surprising that it hasn't happened before.

This is just an extreme concept, the software is much more complicated and some changes can lead to breakages, if these breakages concern a large number of devices and systems, if it is not a security problem it is reasonable to give time to modify this behavior. Time has actually been there, but being Android (in this case) using an old Lts kernel, only now they have noticed.
Linus simply made a reasonable decision, as he does not compromise the security of any devices and gives time to troubleshoot.
Likes 2
Leave a comment:

Announcement

Linux Changes Pipe Behavior After Breaking Problematic Android Apps On Recent Kernels

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: