Announcement

Collapse
No announcement yet.

Linux Changes Pipe Behavior After Breaking Problematic Android Apps On Recent Kernels

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by F.Ultra View Post
    Actually I would have preferred if stdin and sockets would be changed to behave like the pipes did before this change, should lessen the internal strain on epoll if it's only woken on empty buffers.
    I understand the thought, however consider this: If it were that way, then applications would generally read until the buffer is empty before calling epoll_wait again. Now if they do that anyway, there is no additional strain. Therefore nothing is lost.

    However if epoll_wait can also be used to check for a new write even if the buffer isn't empty yet, additional behavior is possible. For example an application might, as an optimitzation, read the whole buffer only once per display frame (so for 60 hz, once every 16.6 ms). However, it might still want to do some other processing whenever there is an update, more often that once per 16.6 ms. Perhaps to keep a timestamp of the latest update, or to be able to say: there have been at least 5 updates. Or to more quicly send back a receipt for receiving updates.

    Or an application needs to (usually) read the data only if a certain time has passed since the last read, as otherwise the data won't be different enough. Nevertheless, it needs to know the fact that there is new data even if it came earlier. Let's say after some data is read, the next write comes 2 ms later, but data doesn't change enough within 2 ms to justify reading the data every 2 ms. So it will read the data only if there is another write at least 100 ms apart. However, if there is no additional write for a whole second, it will want to read that data anyway because it might contain some special request that needs to be processed at most 1 second later. Or simply because it doesn't want to postpone processing that long.

    Or data usually arrives every second (possibly because it contains a time message every second), yet needs to be processed (read) only once a minute. However, if there is no new data for 3 seconds, it needs to restart the connection, or check if the data contains an error message.

    Comment


    • #42
      Originally posted by indepe View Post

      I understand the thought, however consider this: If it were that way, then applications would generally read until the buffer is empty before calling epoll_wait again. Now if they do that anyway, there is no additional strain. Therefore nothing is lost.

      However if epoll_wait can also be used to check for a new write even if the buffer isn't empty yet, additional behavior is possible. For example an application might, as an optimitzation, read the whole buffer only once per display frame (so for 60 hz, once every 16.6 ms). However, it might still want to do some other processing whenever there is an update, more often that once per 16.6 ms. Perhaps to keep a timestamp of the latest update, or to be able to say: there have been at least 5 updates. Or to more quicly send back a receipt for receiving updates.

      Or an application needs to (usually) read the data only if a certain time has passed since the last read, as otherwise the data won't be different enough. Nevertheless, it needs to know the fact that there is new data even if it came earlier. Let's say after some data is read, the next write comes 2 ms later, but data doesn't change enough within 2 ms to justify reading the data every 2 ms. So it will read the data only if there is another write at least 100 ms apart. However, if there is no additional write for a whole second, it will want to read that data anyway because it might contain some special request that needs to be processed at most 1 second later. Or simply because it doesn't want to postpone processing that long.

      Or data usually arrives every second (possibly because it contains a time message every second), yet needs to be processed (read) only once a minute. However, if there is no new data for 3 seconds, it needs to restart the connection, or check if the data contains an error message.
      Well for those cases there is always the Level Trigger (which is the default) events from epoll. For Edge Trigger you really should read until EAGAIN since there is no guarantee that you will get a new write ever again (or at least for a very long time). Now to be honest I don't know the inner details of the kernel here but my take on the strain would be that the "wake up the client" on every single write will take some resources inside the kernel so for Edge Trigger and where you do read until EAGAIN there should still be some potential performance gain inside the kernel to not wake up epoll, or there isn't but the people that changed the pipe code back in 5.5 thought that there was.

      Comment


      • #43
        Originally posted by F.Ultra View Post
        Well for those cases there is always the Level Trigger (which is the default) events from epoll.
        According to the documentation, level-triggered goes in the opposite direction. It doesn't tell you when there is a new write while the buffer is non-empty, since in that case it returns right away.

        Originally posted by F.Ultra View Post
        For Edge Trigger you really should read until EAGAIN since there is no guarantee that you will get a new write ever again (or at least for a very long time).
        The newest behavior is somewhat different than the oldest bevior, see Linus' commit comment:

        [ I say "approximate", because the exact old behavior was to do a wakeup not for each write(), but for each pipe buffer chunk that was filled in. The behavior introduced by this change is not that - this is just "every write will cause a wakeup, whether necessary or not", which seems to be sufficient for the broken library use. ]
        This appears to give that guarantee. And I hope not only for pipes, since in my honest opinion, that is what the man page suggests in the first place. And to me, that makes perfect sense. It can be taken advantage of, so to speak, in a very good way.

        Originally posted by F.Ultra View Post
        Now to be honest I don't know the inner details of the kernel here but my take on the strain would be that the "wake up the client" on every single write will take some resources inside the kernel so for Edge Trigger and where you do read until EAGAIN there should still be some potential performance gain inside the kernel to not wake up epoll, or there isn't but the people that changed the pipe code back in 5.5 thought that there was.
        If you always read until EAGAIN before calling epoll_wait, which is apparently what you want to do, then epoll_wait never has to do any additional wake up. The "potential performance gain" was that in a situation where the buffer isn't empty, epoll_wait wouldn't have to check if there are any waiters. But if the application does what you want it to do, then epoll_wait isn't even running in that situation. Which means it doesn't have to check for waiters either. Unless I am missing something: there may be more complications than I can see at the moment.

        EDIT: In the second part above, epoll_wait would be used with a timeout, and the thread would already know that there is (still) something to be read. Or the read would happen on a different thread. (In the examples that I gave in in the previous post).
        Last edited by indepe; 04 August 2021, 09:49 PM.

        Comment


        • #44
          Originally posted by indepe View Post

          According to the documentation, level-triggered goes in the opposite direction. It doesn't tell you when there is a new write while the buffer is non-empty, since in that case it returns right away.



          The newest behavior is somewhat different than the oldest bevior, see Linus' commit comment:



          This appears to give that guarantee. And I hope not only for pipes, since in my honest opinion, that is what the man page suggests in the first place. And to me, that makes perfect sense. It can be taken advantage of, so to speak, in a very good way.



          If you always read until EAGAIN before calling epoll_wait, which is apparently what you want to do, then epoll_wait never has to do any additional wake up. The "potential performance gain" was that in a situation where the buffer isn't empty, epoll_wait wouldn't have to check if there are any waiters. But if the application does what you want it to do, then epoll_wait isn't even running in that situation. Which means it doesn't have to check for waiters either. Unless I am missing something: there may be more complications than I can see at the moment.

          EDIT: In the second part above, epoll_wait would be used with a timeout, and the thread would already know that there is (still) something to be read. Or the read would happen on a different thread. (In the examples that I gave in in the previous post).
          yes epoll_wait does not have to do anything more but I was more thinking about the code in the kernel that handles the "data comes in from socket/pipe/whatever" since it has to call some code internally to notify among other things epoll. Now this might just be it setting a flag and if so then there is no real gains to be made, but it could also be some more complicated code being called (I just don't know since I have not checked what the kernel does here).

          Also you can use the edge-trigger case to signal you on which fd:s that have data, read only some of it and then call epoll_wait with a zero timeout if you have outstanding reads so that you give epoll the chance to give you more fd:s and at the same time being able to quickly read some more data from your list of fd:s with outstanding data. Now there is the ONESHOT flag for this but then you have to waste a syscall after each EAGAIN, which of course could be a drop in the bucket since you already have called both read and epoll_wait, but then again 3 syscalls are more than 2.

          You are correct in that level trigger would return as long as there is data to be read and not only on each write, it's just me that have a hard time understanding the real need to get notified every time there is a write done but at the same time not being interested in all the data said write wrote which means that you get to be notified that the writer wrote some more for you to then only read what that the writer actually wrote the last time (or parts of it, or it and parts of the new write).

          Comment


          • #45
            Originally posted by F.Ultra View Post
            yes epoll_wait does not have to do anything more but I was more thinking about the code in the kernel that handles the "data comes in from socket/pipe/whatever" since it has to call some code internally to notify among other things epoll. Now this might just be it setting a flag and if so then there is no real gains to be made, but it could also be some more complicated code being called (I just don't know since I have not checked what the kernel does here).
            Right, I was just thinking about that. epoll_create will trigger some code that gets executed even in the absence of epoll_wait. But as you say, I would expect that to be a minor optimization since checking for a waiter could potentially be as easy as checking a flag. It's difficult to tell from the small diff in the commit, since one can't see the part of the code that it has an effect on.

            Originally posted by F.Ultra View Post
            Also you can use the edge-trigger case to signal you on which fd:s that have data, read only some of it and then call epoll_wait with a zero timeout if you have outstanding reads so that you give epoll the chance to give you more fd:s and at the same time being able to quickly read some more data from your list of fd:s with outstanding data. Now there is the ONESHOT flag for this but then you have to waste a syscall after each EAGAIN, which of course could be a drop in the bucket since you already have called both read and epoll_wait, but then again 3 syscalls are more than 2.
            That's where it gets complex...

            Originally posted by F.Ultra View Post
            You are correct in that level trigger would return as long as there is data to be read and not only on each write, it's just me that have a hard time understanding the real need to get notified every time there is a write done but at the same time not being interested in all the data said write wrote which means that you get to be notified that the writer wrote some more for you to then only read what that the writer actually wrote the last time (or parts of it, or it and parts of the new write).
            If you don't know such situations, it may be a bit difficult to imagine. I know such situations (generally speaking) where I send receipts immediately in order to keep the handshaking up-to-date, yet postpone processing the data until certain other conditions are met, and then batch-process it in one go. It probably just means that the fact that data has arrived, and the data itself, are considered two separate pieces of information that are handled separately.

            Comment


            • #46
              Originally posted by F.Ultra View Post
              You are correct in that level trigger would return as long as there is data to be read and not only on each write, [...]
              Additional response:

              However if edge-triggered epoll_wait would return only for a write on empty file, then how would edge-triggered be different from level-triggered?
              Is it just me, or did the previous version conflate the two?

              EDIT: Perhaps some use edge-triggered to restart epoll_wait immediately while another thread is reading the data, such that epoll_wait will return only once that other thread has finished reading, and then more data comes in.

              EDIT 2: But I guess that would have worked as intended only with the intermediate version.
              Last edited by indepe; 04 August 2021, 11:38 PM.

              Comment


              • #47
                By the way, why doesn't the jobserver use a shared counting semaphore?

                Comment


                • #48
                  Originally posted by indepe View Post

                  Additional response:

                  However if edge-triggered epoll_wait would return only for a write on empty file, then how would edge-triggered be different from level-triggered?
                  Is it just me, or did the previous version conflate the two?

                  EDIT: Perhaps some use edge-triggered to restart epoll_wait immediately while another thread is reading the data, such that epoll_wait will return only once that other thread has finished reading, and then more data comes in.

                  EDIT 2: But I guess that would have worked as intended only with the intermediate version.
                  Level-trigger will return from epoll_wait as long as there is data to read, aka as long as the kernel buffer has data the level is set so the difference would be that in the case where you don't empty the buffer until EAGAIN edge would not provide further events while level would return immediately. This is of course more prominent for EPOLLOUT events since the level is almost always set (a write would be non-blocking) while an edge would be a seldom event (going from write would block to not block).

                  Also accept() have a thundering herd problem in multithread with level trigger. And since you can only have one instance per fd in the epoll you might have to add it edge-triggered for reads even if you only really needed the edge-trigger mechanism for say writes.

                  Comment


                  • #49
                    Originally posted by indepe View Post
                    By the way, why doesn't the jobserver use a shared counting semaphore?
                    I guess it's multiple processes and written to be run on systems without named semaphores?

                    Comment


                    • #50
                      Originally posted by indepe View Post
                      If you don't know such situations, it may be a bit difficult to imagine. I know such situations (generally speaking) where I send receipts immediately in order to keep the handshaking up-to-date, yet postpone processing the data until certain other conditions are met, and then batch-process it in one go. It probably just means that the fact that data has arrived, and the data itself, are considered two separate pieces of information that are handled separately.
                      Now it sounds like you control both the read and the writer here, but would such a setup not make your application to end up in a situation where the processing never happens due to some of the write events being merged into a single one so you hang on epoll_wait? Or you process the data on timeout as well and don't much of a latency requirement?

                      Comment

                      Working...
                      X