Announcement

Collapse
No announcement yet.

Windows NT Sync Driver Proposed For The Linux Kernel - Better Wine Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by indepe View Post
    So far I don't have much experience with cross-process thread communication (however I do with single-process multi-threading). I just measured an atomic variable round trip using shared memory: process A sets a value that B is waiting for (by spinning on it), B responds by setting a value that A is waiting for. On a Zen2 CPU it takes just 35 ns or a bit less, for the whole round-trip (so about 17 ns for one direction). By comparison, a single syscall like getpid takes 179 ns, a syscall for a function number that doesn't exist (an empty syscall so to speak), a bit more than 150 ns (although I think it used to be less than that on older systems). (everything measured in a loop.) So that's good news for lock-free queues. Task switching can surely be reduced on the server side by giving the server a higher priority, but that is something I haven't measured between processes, and wineserver is using a higher priority already so that's not necessarily an area of improvement in comparison. If that's what you mean with task switching.

    I wouldn't really want to give more details about things that I haven't fully implemented and not yet seen how they work out in practice.
    Well that makes sense and is what I'd expect. you have no task switches though, because you're busy waiting on the server side, so it's already doing tasks (even if uselessly spinning). Works fine if you don't mind utilizing an entire CPU core for it, but in practice not so much. In practice you'd alert the server to wake up at least with a syscall, and usually it would result in a task switch on it (which is way worse).

    FUTEX_SWAP could mitigate this by effectively reducing it to an overhead of just a syscall, since it would use the exact same thread that alerted to do the server stuff. But it's yet to get implemented into the kernel AFAIK (correct me if I'm wrong, I've been out of the loop of it and gave up honestly, I wanted it so much for my own stuff).

    However even with FUTEX_SWAP, then you run into the problem with synchronization, since now you need one thread on the server for each "userspace" thread that talks to it. Because the thread that switches to the server thread (and keeps all the caches and most context and so on, which is what makes FUTEX_SWAP 10 times faster than without) will now not be the only one on the server (it's not single threaded anymore). That's why I said first they'd have to make the wineserver multi-threaded but they're so stubborn against this idea... :/

    Note that one syscall overhead is unavoidable, but it's OK. ntsync, esync, fsync and all other "userspace" syncs also actually use syscalls lol. So it's not like they're any faster in this regard. That's not the problem.

    Comment


    • Originally posted by Weasel View Post
      Well that makes sense and is what I'd expect. you have no task switches though, because you're busy waiting on the server side, so it's already doing tasks (even if uselessly spinning). Works fine if you don't mind utilizing an entire CPU core for it, but in practice not so much. In practice you'd alert the server to wake up at least with a syscall, and usually it would result in a task switch on it (which is way worse).

      FUTEX_SWAP could mitigate this by effectively reducing it to an overhead of just a syscall, since it would use the exact same thread that alerted to do the server stuff. But it's yet to get implemented into the kernel AFAIK (correct me if I'm wrong, I've been out of the loop of it and gave up honestly, I wanted it so much for my own stuff).

      However even with FUTEX_SWAP, then you run into the problem with synchronization, since now you need one thread on the server for each "userspace" thread that talks to it. Because the thread that switches to the server thread (and keeps all the caches and most context and so on, which is what makes FUTEX_SWAP 10 times faster than without) will now not be the only one on the server (it's not single threaded anymore). That's why I said first they'd have to make the wineserver multi-threaded but they're so stubborn against this idea... :/

      Note that one syscall overhead is unavoidable, but it's OK. ntsync, esync, fsync and all other "userspace" syncs also actually use syscalls lol. So it's not like they're any faster in this regard. That's not the problem.
      That's of course a valid challenge. In the case of using a server process, other questions have simpler and straightforward answers, so this is an area where this solution might spend larger efforts on optimization. FUTEX_SWAP might indeed be useful here, and/or maybe something like an option for FUTEX_WAKE that says: "donate the remainder of my time-slice to the woken thread, if possible". The versions of FUTEX_SWAP proposals that I have seen appeared to suggest to switch the core and/or time slice to a different thread waiting on a futex, so the (separate) event server could still be single-threaded if that's what it wants to be. Maybe you have somthing different in mind.

      Probably the first step might be to make local events (events where both sender and waiter are in the same process) be solvable by the process itself. A second step, the server would, like adaptive mutexes, spin for a while before it goes into a waiting state, so that it can respond to calls that come in groups without needing a wake-up each time. So the server would alternate between longer times of being awake, and going into a wait-state only if there is a longer time of inactivity.

      For games (or other apps) that use cross-process events, some might use them so infrequently that the time needed for wake-ups (which isn't that long) doesn't make much of a difference.

      That would leave games (and apps) that use cross-process events frequently. One thing I would say is that in the past (meaning I don't know if that has changed) games seemed to make full use of a few cores only, so this might actually be way to put an additional core to good use, and probably an economy core would be sufficient (which a game otherwise might not want to use).

      Since so far everything is probably relatively straightforward (except for the need to support outdated, complicating APIs in the wine case), one could spend additional efforts on optimizations like connecting event sender and event waiter directly. For example, if the server receives a wait event call before going into a wait-state itself, it could leave a marker on the event that says: "if this event occurs, wake the waiter's futex directly, instead of my futex as an intermediate". That's of course a simplified description. Also in this context one might come up with new generic kernel features that would also be of use in a general way.

      Obviously these is all said before actually getting any serious experience in cross-process software. Pardon me.

      Comment


      • Originally posted by Weasel View Post
        No, if the access is blocked it simply continues to the next task queued.

        You don't really understand how "single threaded" apps that "appear" multi-threaded in terms of locks work, do you? AutoHotkey works the same way btw, it has "threads" and all the waiting stuff but internally it's literally just one thread.
        This is when someone goes and incorrectly presumes how things go wrong.

        Does wine server always proceed to the next task when queued when blocked. The answer is a horrible no it does not.



        There are a stack of different ways Wineserver single thread can stop dead.

        Yes I understand how single threaded can appear multi threaded Weasel. But I also understand that due to issues like running out of file handles or other items wine single thread can stop proceeding forwards. No dead lock detect ion wine server means hello every stops when one of these events happen. Yes it can result in the system complete lock up as the Windows application tries method after method to function getting zero response from the wineserve slowly but surely consuming all system resources.

        Single thread emulating multi thread has a low lock up rate. But when it does lockup it is horrible bad. Yes wineserver thread stopped dead for any reason makes Wineserver a possible deadlock.

        There are cases where like it or not wineserver single thread stops. Yes these are bugs. If wineserver had a watchdog it would be picked up it not processing so terminate wineserver and applications that are now most likely deadlocked because wine server has stopped processing..

        Remember I said wineserver can deadlock it rare and you are correct is because of bugs. Bugs like using too many filehandles so the single thread stops and there other ones of this class in the wine bug list.

        Yes some of these will bring down multi event thread processing as well.

        Watchdog code is important so that when something unexpected happens that the damage is limited. Wine lacks watchdogs around the wineserver and this means the events where wineserver deadlocks due to something stopping thread processing turn into horrible messes as it comes a cause for other parts of wine and windows applications to deadlock. Yes some applications dig themselves in deeper as they attempt to work around deadlocks..

        Comment


        • Originally posted by indepe View Post
          Obviously these is all said before actually getting any serious experience in cross-process software. Pardon me.
          Actually I should say: without serious experience in developing synchronization primitives for cross-process communication. I do have experience with both synchronization primitives (including lock-free ones) multi-threaded yet single-process, and for example with client-server software. So I guess what I say does carry a bit of weight.

          Comment


          • Originally posted by indepe View Post
            That's of course a valid challenge. In the case of using a server process, other questions have simpler and straightforward answers, so this is an area where this solution might spend larger efforts on optimization. FUTEX_SWAP might indeed be useful here, and/or maybe something like an option for FUTEX_WAKE that says: "donate the remainder of my time-slice to the woken thread, if possible". The versions of FUTEX_SWAP proposals that I have seen appeared to suggest to switch the core and/or time slice to a different thread waiting on a futex, so the (separate) event server could still be single-threaded if that's what it wants to be. Maybe you have somthing different in mind.
            Yeah I'm just unsure what happens when it's contended and not waiting because it's busy doing something else for the moment (single thread issues you know). Does it switch that core later when it goes to wait?

            Comment


            • Originally posted by oiaohm View Post
              This is when someone goes and incorrectly presumes how things go wrong.

              Does wine server always proceed to the next task when queued when blocked. The answer is a horrible no it does not.



              There are a stack of different ways Wineserver single thread can stop dead.
              That's a bug. I never said it's bug free. But by design it shouldn't. Bugs are obviously not intended.

              Comment


              • Originally posted by Weasel View Post
                Yeah I'm just unsure what happens when it's contended and not waiting because it's busy doing something else for the moment (single thread issues you know). Does it switch that core later when it goes to wait?
                If/while it is doing something else, a work item would get added to a lock-free queue. If you want to protect against rogue applications (which in the case of wine you would), it would be a separate queue for each such app, for example. However, these are implementation-dependent details, which would depend also on the experience made during prototyping/testing.

                Comment


                • Originally posted by Weasel View Post
                  That's a bug. I never said it's bug free. But by design it shouldn't. Bugs are obviously not intended.
                  This is your problem "by design it should not"

                  Weasel there are more than one form of common deadlock.

                  1) Multi thread deadlock this is application internal locking being wrong causing a dead lock. Wineserver design is counter to this.
                  2) Multi process deadlock. This can be hidden locks on resources. Wine has had cases of people optimizing wine prefixes on network storage running into a dead lock accessing registry because of file system hidden locking so yes it possible for 2 running wineservers to have event that they dead lock with each other.if user has done something odd to the wineprefixes there are cases that wineserver runs into some other application holding a lock on something it need to use also causing this deadlock. So Wineserver design does not prevent this one fully. Some parts of wineserver design limits how this can happen.
                  3) Process to kernel dead lock. This is where the syscall in kernel that the application has called managed to end up dead locked so the wineserver thread/process stops because the syscall to kernel never returns. This rare to happen with Linux kernels more common with the BSD/MacOS kernels

                  Wineserver does have documented deadlocks of type 2/3 and type 2/ 3 the current design of Wineserver does not prevent or contain.

                  Wine support recommendation of one application per wineprefix one of the reasons for this is mitigation against type 2 and 3 deadlocks so that the number of effected applications is kept down.

                  Weasel simple thing to forgot is OS today are multi threaded so even if you program is not multi thread it can still be deadlocked because the multi thread OS can deadlock on operations it performing for your application.

                  There is a serous reason why watchdog items are good even for single threaded designed applications.

                  The reality here is wineserver code can be 100 percent bug free and due it having a watchdog and something else be wrong it end up deadlocked.

                  The by design argument is commonly a way not to have a broad enough view of the problem space. You straight up said that wineserver does not deadlock the reality is wineserver does deadlock and they are rarer forms of deadlock.

                  Yes there is a myth we make our program single thread we have made it immune to deadlocks when in reality this is only prevent 1 form of deadlock. There are more than the 3 I list for how to end up deadlocked and over wine history the wine project has seen them all happen to wineserver.

                  Wineserver current design does deadlock rarely due to the more exotic forms of deadlocks mixed with a percentage of problem exists between keyboard and chair attempting things that should not be done by documentation.. One of the big issues is some of wineserver deadlocks end up trigger applications go complete stupid and start attempting to delete millions of files, consume all ram and so on.

                  Wineserver deadlocks are rare so most people are not aware of how much of a disaster wineserver deadlocks can be..

                  Weasel lot of developers add try catch statements to programs so that unintended event does not cause absolute disaster. A watchdog on the wineserver would the same thing unintended event happens result in wineserver stalled out the watchdog steps in kills everything running under wine before everything goes totally out of control.

                  Good design should be able to handle a decent percentage of unintended events. Particularly the ones that can be highly harmful to the end user like windows applications going completely nuts because wineserver has stopped processing..

                  Comment


                  • Originally posted by oiaohm View Post
                    This is your problem "by design it should not"

                    Weasel there are more than one form of common deadlock.

                    1) Multi thread deadlock this is application internal locking being wrong causing a dead lock. Wineserver design is counter to this.
                    2) Multi process deadlock. This can be hidden locks on resources. Wine has had cases of people optimizing wine prefixes on network storage running into a dead lock accessing registry because of file system hidden locking so yes it possible for 2 running wineservers to have event that they dead lock with each other.if user has done something odd to the wineprefixes there are cases that wineserver runs into some other application holding a lock on something it need to use also causing this deadlock. So Wineserver design does not prevent this one fully. Some parts of wineserver design limits how this can happen.
                    3) Process to kernel dead lock. This is where the syscall in kernel that the application has called managed to end up dead locked so the wineserver thread/process stops because the syscall to kernel never returns. This rare to happen with Linux kernels more common with the BSD/MacOS kernels
                    For (3), it's a bug because they didn't expect the kernel to reach the limit and block itself. It was not accounted for, not intended.

                    It's a mistake that is hard to fix, but still, a mistake. A bug. It's not a design flaw, it's the kernel's "quirk" (imposing limits not in the API itself).

                    Comment


                    • Originally posted by Weasel View Post
                      For (3), it's a bug because they didn't expect the kernel to reach the limit and block itself. It was not accounted for, not intended.

                      It's a mistake that is hard to fix, but still, a mistake. A bug. It's not a design flaw, it's the kernel's "quirk" (imposing limits not in the API itself).
                      Not a internal-lock-too-big problem. A completely single-threaded server wouldn't even need internal locks (insofar as really all operations are carried out on that thread only).

                      That would come into play only if and insofar as some operations are carried out by clients or async callbacks or so (which might be an optimization in an event server of other design than the wineserver, unless those are using so-called lock-free programming.)

                      ntsync needs locks because those operations within the kernel are multi-threaded (and not using lock-free programming, which would most likely be very difficult in that design).
                      Last edited by indepe; 28 January 2024, 06:19 PM.

                      Comment

                      Working...
                      X