Announcement

Collapse
No announcement yet.

Samsung Dealing With Wayland "Zombie Apocalypse" Bug

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tomin
    replied
    Originally posted by CrystalGamma View Post
    (actually it's about FDs sent in events, not events sent over FDs)
    Okay, I got that mixed. (In other words I read the post a few hours earlier and had forgotten that already.)

    Leave a comment:


  • CrystalGamma
    replied
    Originally posted by oiaohm View Post
    Even altering the protocol to provide number of fd does not mean you could not have end up in a out of sync case. So from my point of view this garbage collection need to be written into the protocol documents so those implementing libwayland bits in rust and the like are on the same page.
    I would have said it is pretty obvious that the client has to expect objects it has sent a delete request for to continue receiving events until the server confirms the deletion with the 'delete_id' event, including all that entails, like FDs in the message.
    There is however, another edge case I'm curious about (but too lazy to read once again): What about events from a protocol extension that the client doesn't support, which contain FDs? I don't remember how they deal with feature negotiation/mismatch in the protocol …

    Leave a comment:


  • oiaohm
    replied
    Originally posted by Zoll View Post
    Why change the protocol, just have all the 20 different compositors implement different solutions for this, just like everything else they do differently.
    Absolutely not the case here. The change is done in libwayland-server and libwayland-client. So all clients and compositors using these libraries get updated form the change. There is still a problem is the wayland implementations in rust and the like. This fd garbage collection does need to go into the specification to keep them on the same page.

    Originally posted by Tomin View Post
    The zombies are the solution to bad API design, not a part of the API. Wayland API is asynchronous and it can happen that an object with file descriptor (FD) is destroyed, but that FD will still receive events. The zombies will "eat" these events until everything is in sync and the events are not sent anymore to the FD. At this point the zombie can be destroyed as well. At least this is what I understood from the blog text when I read it.
    Zombies are sometimes part of protocol design. Sometimes they make lots of sense. RCU in the Linux kernel has a garbage collector as well that reaps what is no longer connected. When you are talking asynchronous protocols its nothing strange to require zombie handling in particular places. Why a client or server could have a different state of affairs to everything else so needing to send something to something that should be gone because they have not read the inform yet that it is gone. This is one of the tricky problems of multi threading running on multi processors everything may not be 100 percent in sync allowing zombies and performing garbage collection on zombies can be quite acceptable way to deal with the out of sync state.

    Even altering the protocol to provide number of fd does not mean you could not have end up in a out of sync case. So from my point of view this garbage collection need to be written into the protocol documents so those implementing libwayland bits in rust and the like are on the same page.

    Leave a comment:


  • CrystalGamma
    replied
    Originally posted by Tomin View Post

    The zombies are the solution to bad API design, not a part of the API. Wayland API is asynchronous and it can happen that an object with file descriptor (FD) is destroyed, but that FD will still receive events. The zombies will "eat" these events until everything is in sync and the events are not sent anymore to the FD. At this point the zombie can be destroyed as well. At least this is what I understood from the blog text when I read it.
    Yes, I know what the bug is about (actually it's about FDs sent in events, not events sent over FDs), but the spec prescribes nothing that contradicts this kind of handling. So there is nothing to change in and definitely nothing wrong with the protocol, just a bug in libwayland.
    Every other wayland implementation doesn't need to care (well, they have to check if they handle this case correctly, but they might already do …).

    Leave a comment:


  • Tomin
    replied
    Originally posted by CrystalGamma View Post
    I still don't see why the problem is supposed to be in the protocol. I've read the spec (both the docs on the website, and the protocol/wayland.xml in the repository), and none of them mention (I've re-checked right now) zombie objects at all, and nothing says that deletion should be handled a specific way (Actually, while reading, I assumed objects would just live on in the client, inaccessibly, until the delete_id event comes in, which is pretty similar to what this guy at samsung now implemented).
    All I see is someone making a mistake in the reference implementation, which is being fixed right now.
    The zombies are the solution to bad API design, not a part of the API. Wayland API is asynchronous and it can happen that an object with file descriptor (FD) is destroyed, but that FD will still receive events. The zombies will "eat" these events until everything is in sync and the events are not sent anymore to the FD. At this point the zombie can be destroyed as well. At least this is what I understood from the blog text when I read it.

    Leave a comment:


  • CrystalGamma
    replied
    I still don't see why the problem is supposed to be in the protocol. I've read the spec (both the docs on the website, and the protocol/wayland.xml in the repository), and none of them mention (I've re-checked right now) zombie objects at all, and nothing says that deletion should be handled a specific way (Actually, while reading, I assumed objects would just live on in the client, inaccessibly, until the delete_id event comes in, which is pretty similar to what this guy at samsung now implemented).
    All I see is someone making a mistake in the reference implementation, which is being fixed right now.

    Leave a comment:


  • Cerberus
    replied
    Mir Master Race.

    Leave a comment:


  • Zoll
    replied
    Why change the protocol, just have all the 20 different compositors implement different solutions for this, just like everything else they do differently.

    Leave a comment:


  • DrYak
    replied
    Originally posted by boxie View Post
    I would have thought that the solution would be to fix the protocol and force an update
    Keep in mind that Wayland is currently deployed in the wild. It's not only something developed for some upcoming Fedora beta, there are devices running it in production right now :
    - I think that Samsung has been using it in their Tizen -powered phones
    - Jolla has been using it in the mer core of their Sailfish OS, all the way back from their first smartphone.

    I takes some time until an update is approved downstream into an embed system. (Will the completely new Wayland version with the updated version continue to work as expected with all the currently deployed component in production ? including 3rd parties app ? including binary-only 3rd party apps for which the source code can't be simply recompiled by some instances of suse's Open Build System ?)

    Not everyone is Apple and is completely happy with completely breaking all the apps by completely changing the API every now and then. (Of course I'm exagerating a bit, Apple still made the efforts of creating the Carbon API upgrade path)

    (On the other hand, not everyone is Microsoft with a giant unstable about-to-explode (-and-actually-exploding-quite-regularily) giant katamary of duck tape for an API containing more unmaintained legacy cruft than actually sanctioned APIs.
    That's why X11 was eventually replaced with Wayland in Linux, and that's why eventually an updated protocol will be rolled out in an upcoming Wayland major upgrade. Just not today.)

    Leave a comment:


  • TMM_
    replied
    Given the problem this is actually a rather elegant solution. It's a little hacky for sure but it's an ELEGANT hack given the problem at hand. Serious kudos to who came up with this.

    Leave a comment:

Working...
X