Announcement

Collapse
No announcement yet.

Samsung Dealing With Wayland "Zombie Apocalypse" Bug

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Why change the protocol, just have all the 20 different compositors implement different solutions for this, just like everything else they do differently.

    Comment


    • #12
      Mir Master Race.

      Comment


      • #13
        I still don't see why the problem is supposed to be in the protocol. I've read the spec (both the docs on the website, and the protocol/wayland.xml in the repository), and none of them mention (I've re-checked right now) zombie objects at all, and nothing says that deletion should be handled a specific way (Actually, while reading, I assumed objects would just live on in the client, inaccessibly, until the delete_id event comes in, which is pretty similar to what this guy at samsung now implemented).
        All I see is someone making a mistake in the reference implementation, which is being fixed right now.

        Comment


        • #14
          Originally posted by CrystalGamma View Post
          I still don't see why the problem is supposed to be in the protocol. I've read the spec (both the docs on the website, and the protocol/wayland.xml in the repository), and none of them mention (I've re-checked right now) zombie objects at all, and nothing says that deletion should be handled a specific way (Actually, while reading, I assumed objects would just live on in the client, inaccessibly, until the delete_id event comes in, which is pretty similar to what this guy at samsung now implemented).
          All I see is someone making a mistake in the reference implementation, which is being fixed right now.
          The zombies are the solution to bad API design, not a part of the API. Wayland API is asynchronous and it can happen that an object with file descriptor (FD) is destroyed, but that FD will still receive events. The zombies will "eat" these events until everything is in sync and the events are not sent anymore to the FD. At this point the zombie can be destroyed as well. At least this is what I understood from the blog text when I read it.

          Comment


          • #15
            Originally posted by Tomin View Post

            The zombies are the solution to bad API design, not a part of the API. Wayland API is asynchronous and it can happen that an object with file descriptor (FD) is destroyed, but that FD will still receive events. The zombies will "eat" these events until everything is in sync and the events are not sent anymore to the FD. At this point the zombie can be destroyed as well. At least this is what I understood from the blog text when I read it.
            Yes, I know what the bug is about (actually it's about FDs sent in events, not events sent over FDs), but the spec prescribes nothing that contradicts this kind of handling. So there is nothing to change in and definitely nothing wrong with the protocol, just a bug in libwayland.
            Every other wayland implementation doesn't need to care (well, they have to check if they handle this case correctly, but they might already do …).

            Comment


            • #16
              Originally posted by Zoll View Post
              Why change the protocol, just have all the 20 different compositors implement different solutions for this, just like everything else they do differently.
              Absolutely not the case here. The change is done in libwayland-server and libwayland-client. So all clients and compositors using these libraries get updated form the change. There is still a problem is the wayland implementations in rust and the like. This fd garbage collection does need to go into the specification to keep them on the same page.

              Originally posted by Tomin View Post
              The zombies are the solution to bad API design, not a part of the API. Wayland API is asynchronous and it can happen that an object with file descriptor (FD) is destroyed, but that FD will still receive events. The zombies will "eat" these events until everything is in sync and the events are not sent anymore to the FD. At this point the zombie can be destroyed as well. At least this is what I understood from the blog text when I read it.
              Zombies are sometimes part of protocol design. Sometimes they make lots of sense. RCU in the Linux kernel has a garbage collector as well that reaps what is no longer connected. When you are talking asynchronous protocols its nothing strange to require zombie handling in particular places. Why a client or server could have a different state of affairs to everything else so needing to send something to something that should be gone because they have not read the inform yet that it is gone. This is one of the tricky problems of multi threading running on multi processors everything may not be 100 percent in sync allowing zombies and performing garbage collection on zombies can be quite acceptable way to deal with the out of sync state.

              Even altering the protocol to provide number of fd does not mean you could not have end up in a out of sync case. So from my point of view this garbage collection need to be written into the protocol documents so those implementing libwayland bits in rust and the like are on the same page.

              Comment


              • #17
                Originally posted by oiaohm View Post
                Even altering the protocol to provide number of fd does not mean you could not have end up in a out of sync case. So from my point of view this garbage collection need to be written into the protocol documents so those implementing libwayland bits in rust and the like are on the same page.
                I would have said it is pretty obvious that the client has to expect objects it has sent a delete request for to continue receiving events until the server confirms the deletion with the 'delete_id' event, including all that entails, like FDs in the message.
                There is however, another edge case I'm curious about (but too lazy to read once again): What about events from a protocol extension that the client doesn't support, which contain FDs? I don't remember how they deal with feature negotiation/mismatch in the protocol …

                Comment


                • #18
                  Originally posted by CrystalGamma View Post
                  (actually it's about FDs sent in events, not events sent over FDs)
                  Okay, I got that mixed. (In other words I read the post a few hours earlier and had forgotten that already.)

                  Comment


                  • #19
                    The blog post talks about the "object type" indicating the signature. Then why not one zombie per "object type" instead of per object? But then, one of the emails talks about replacing the the proxy with the signature info itself. So probably sounds much more alarming than it actually is.

                    Comment


                    • #20
                      Originally posted by CrystalGamma View Post
                      I would have said it is pretty obvious that the client has to expect objects it has sent a delete request for to continue receiving events until the server confirms the deletion with the 'delete_id' event, including all that entails, like FDs in the message.
                      That is the problem. Its not written in the protocol. So under Unix/Linux/Posix when you have transferred FD to another process you are in fact free to close it. This bring out one of the what the moments when server sends event client no longer receiving at all and server does not get any response.

                      Posts about memfd written by David Rheinsberg


                      Sealing on fds exist for the exact reason that you send over a fd and you never ever want to see it again. The protocol does not state that sealing should or should not be done or that fds have to remain standing.

                      Basically inside Posix IPC you cannot believe that when you have received a fd that there is anything still there unless the protocol you are using states it has to be.

                      The leaks and issues happened one due to client developers not believing they had to clean up their fd when it is a client side responsibility. Other issues happened because client code was cleaning up recycling a fd so now a fd that had not been closed has been used more than once. Read the protocol again nothing says you are not allowed to-do this.

                      So you pretty obvious is wrong due to how people broke it. This is like the idea common sense its not that common. If you want a pretty obvious/common sense thing to be they way it is write it in the documentation or out of the million monkeys out there someone will stuff it up.

                      Comment

                      Working...
                      X