Announcement

Collapse
No announcement yet.

Wine Developers Are Working On A New Linux Kernel Sync API To Succeed ESYNC/FSYNC

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by oiaohm View Post

    Really no. pthreads under linux is not just a library on most Linux distributions. "NPTL (Native POSIX Threads Library)" This requires a Linux 2.6 kernel or newer to have the required syscalls so it can function. The shared memory wine is using is very much like what pthreads before NPTL used this not a good work around for pthreads either.

    Threading locking you really do need the kernel scheduler to know about it.
    Obviously thread locking will use low-level kernel functions. The pthread library uses the FUTEX syscalls to implement semaphores and locks. So could this library. (It doesn't have to implement the whole of pthreads, of course.)

    As I have written in previous discussions of this subject, waiting for multiple events can be implemented on top of the existing FUTEX syscalls. For performance, you might want to add a very simple to implement WAKE-multiple (just to avoid multiple syscalls, one for each WAKE), but you don't need WAIT-multiple, which would place an unnecessary burden on the kernel in the absence of contention (as partly one of the kernel engineers has pointed out.).

    Originally posted by oiaohm View Post
    Yes there are particular behaviours in the windows kernel where if a thread/process is stopped on a lock where its time slice goes to the process/thread holding the lock this causes some major performance differences. Software library in userspace does not fix these issues because you are needing the kernel scheduler to be aware what the lock states are and allocate cpu resources according to that information. Yes once this information has got to scheduler about the locking most of the information no longer needs to be shared by the work around memory mappings.
    interesting that you mention this since it is not an obvious part of this discussion. Yet in a previous discussion here I have already proposed a syscall that allows waiting for a FUTEX and transferring the CPU resource to a specific thread if the FUTEX is not available. (The information which thread is holding a lock can very well be maintained in user space.)

    Although I don't know much about memory mappings, I'm sure any missing low-level functionality could be added by low-level syscalls of much more general use than just the imitiation of Windows functions.

    Originally posted by oiaohm View Post
    The trap here the way you are forced todo things when you don't have the features you need from the kernel are horrible memory maps that just cause more problems. Threading locking libraries done pure userspace always end up using horrible memory mappings that always cause trouble because the kernel scheduler is not getting the information it needs to make correct choices and the memory needs to be read/write between the processes so opening up exploits there. Kernel based thread locking is vastly superior to user-space stuff because the need for memory read/write between processes goes away and the scheduler has access to information it can use to make more correct choices..
    Again, the idea is that such a library would use the same syscalls as pthreads uses, for its user space implementation of locks and semaphores based on FUTEXes. In the absence of contention, user space evaluation is ***far*** more performant than invoking a syscall at a higher level even when there is no contention. The latter is simply a bad idea.

    Originally posted by oiaohm View Post
    Of course these new forms of locking wine wants could be useful to native Linux programs in future due to their different behaviours. There are other areas in Linux that are mix of ways BSD does things and ways other parties have done things.
    I disagree. Even the author of that email mentioned that many Windows APIs are meanwhile deprecated. Most cases where you want to wait for multiple events can be implemented with much simpler functions and much more efficiently than the overkill Windows API called "WaitForMultipleObjects" requires. Linux needs those much more efficient implementations, if any.

    Comment


    • #22
      Originally posted by Cybmax View Post
      So.. im still not really grasping the difference between the fsync kernel patchset and the fsync2 kernel patchset. Are they interchangeable? Can i run proton on a kernel patched with fsync2 (and NOT the "old" fsync patchset)? (And actually USE fsync ofc.. not just "run it", since i suppose someone would nitpick on the wording here...)

      And.. this is opting to be the 3rd round of fsync? (or. winsync/ntsync/winesync/supersync... whatever).
      There's no fsync kernel patchset, fsync is a wine thing. There's also no fsync2 anything.

      The Linux kernel has long had a futex syscall, but it's not enough for what Windows API needs. So the "fsync" patchset on wine depends on a new syscall, called futex2, which extends it so it can better match Windows.

      tl;dr: futex ≠ fsync

      Comment


      • #23
        Originally posted by indepe View Post
        As I have written in previous discussions of this subject, waiting for multiple events can be implemented on top of the existing FUTEX syscalls. For performance, you might want to add a very simple to implement WAKE-multiple (just to avoid multiple syscalls, one for each WAKE), but you don't need WAIT-multiple, which would place an unnecessary burden on the kernel in the absence of contention (as partly one of the kernel engineers has pointed out.).
        Yes and no. The Wait on multiple is not that simple with NT. Because there is not absence of contention due to the horrible way handles/objects work in windows. Wake multi exists on windows so that you are not modify objects that cannot be used. There is a need here for more than 1 lock to be processed as a group. I am not saying that the kernel developers are wrong that doing it all the time is going to cause over head.


        To be correct you have not read the first post well.

        Storing object state in shared memory means that it can be corrupted. It's not usually a problem in practice, of course, but it's not safe enough for upstream.

        This is fun part of windows objects change flags in the data structs of a object based on the lock status. Object/handle what NT uses instead of a file handle. Yes just like DMA BUF when you pass over a file it owns to where it passed objects/handles under Windows being used in locking same kinds of things are meant to happen. FUTEX is only giving you a 32 bit value. This is where you are in trouble there data structures that need to change aligned to the lock status in a safe way.


        Windows doesn't have a concept of NOFILE. Buggy programs exist which leak hundreds of thousands of synch primitives. Google Earth VR is an example. esync ships with a notice telling the user to raise the hard limit for NOFILE; later systemd itself raised the default limit (not solely because of esync, but it was the trigger).

        This is why you cannot use existing file handles.

        There are a lot more cases covered where the existing solution don't fit what wine needs todo with NT objects.

        There is something more complex than your normal locking. There are things that are just being round peg square hole here.


        Originally posted by indepe View Post
        I disagree. Even the author of that email mentioned that many Windows APIs are meanwhile deprecated. Most cases where you want to wait for multiple events can be implemented with much simpler functions and much more efficiently than the overkill Windows API called "WaitForMultipleObjects" requires. Linux needs those much more efficient implementations, if any.
        This here is not understanding horrible windows world. Functions may be official deprecated by Microsoft but new windows applications will be released using them for quite some time. Function being deprecated with windows is not a hard case against it being used.

        Comment


        • #24
          Originally posted by Weasel View Post
          The Linux kernel has long had a futex syscall, but it's not enough for what Windows API needs.
          Not directly, however it can be implemented on top of the existing futex syscall. Maybe that's not obvious to everyone.

          Comment


          • #25
            Originally posted by oiaohm View Post

            Yes and no. The Wait on multiple is not that simple with NT. Because there is not absence of contention due to the horrible way handles/objects work in windows. Wake multi exists on windows so that you are not modify objects that cannot be used. There is a need here for more than 1 lock to be processed as a group. I am not saying that the kernel developers are wrong that doing it all the time is going to cause over head.
            "Absence of contention" refers to the dynamic situation when a thread does not need to be blocked, because the conditions it might be waiting for are already met. In user space implementations such as pthread's implementation of most locks and semaphores, in these cases the thread can continue without making a syscall.

            Comment


            • #26
              Originally posted by oiaohm View Post
              This is why you cannot use existing file handles.
              I don't propose using file handles, on the contrary.

              Comment


              • #27
                Originally posted by Weasel View Post
                There's no fsync kernel patchset, fsync is a wine thing. There's also no fsync2 anything.

                The Linux kernel has long had a futex syscall, but it's not enough for what Windows API needs. So the "fsync" patchset on wine depends on a new syscall, called futex2, which extends it so it can better match Windows.

                tl;dr: futex ≠ fsync
                Yeah, i am sorry for my bad semantics. "Futex wait multiple" or whatever.

                Let me ask this then:
                Does (wine)proton (with the fsync patchset) work with BOTH these patchsets (not at the same time, but separately):
                1. https://github.com/sirlucjan/kernel-...ev-patches-sep
                2. https://github.com/sirlucjan/kernel-...unk-patches-v2

                The "futex dev patches" is the one that popped up a year or whatnot ago, and the "futex2-trunk" is the "new" patchset i have yet to try.
                I was kinda under the impression they were not the same, and the reason i ask is i wonder if they DO the same.

                It is not automatic for me to understand that "futex wait multiple" is exactly the same as "futex2" (Cos if you CALL a bloody patch futex2, it is bloody well that i am going to refer to it).

                Comment


                • #28
                  Originally posted by indepe View Post
                  "Absence of contention" refers to the dynamic situation when a thread does not need to be blocked, because the conditions it might be waiting for are already met. In user space implementations such as pthread's implementation of most locks and semaphores, in these cases the thread can continue without making a syscall.
                  The structures inside NT does not make this this simple. Conditions for waiting may be meet but the need for contents of structures to change across multi processes/threads.... This creates a nice little magic instant contention. NT implement of locking of objects is not like you pthread locks and semaphores. There are lots of cases due to this NT mess were you really cannot go forwards without a syscall or leaving memory globally writeable with any contents that is not good.

                  The requirement to alter structure contents in all threads are using the object is very unqine NT design annoyance. Please note something can be accessing the object under NT and not take out lock and expect failure if something has taken out a lock on it already.

                  This is why the idea of "Absence of contention" is not really valid there are many of the NT areas that Microsoft is attempt to deprecated with multi level of fail that by design have contention 100 percent of the time and programs expect it. Basically welcome to the windows equal to the Linux kernel big kernel lock and Microsoft is having a even worse time removing it.

                  Comment


                  • #29
                    Originally posted by oiaohm View Post

                    The structures inside NT does not make this this simple. Conditions for waiting may be meet but the need for contents of structures to change across multi processes/threads.... This creates a nice little magic instant contention. NT implement of locking of objects is not like you pthread locks and semaphores. There are lots of cases due to this NT mess were you really cannot go forwards without a syscall or leaving memory globally writeable with any contents that is not good.

                    The requirement to alter structure contents in all threads are using the object is very unqine NT design annoyance. Please note something can be accessing the object under NT and not take out lock and expect failure if something has taken out a lock on it already.

                    This is why the idea of "Absence of contention" is not really valid there are many of the NT areas that Microsoft is attempt to deprecated with multi level of fail that by design have contention 100 percent of the time and programs expect it. Basically welcome to the windows equal to the Linux kernel big kernel lock and Microsoft is having a even worse time removing it.
                    I'm not sure what exactly you are referring to, however the "need for contents of structures to change across multi processes/threads", as such, is a common one, and usually accomplished with atomic instructions (such as CMPXCHG), so that syscalls are only needed on the so-called "slow path" when a thread needs to be suspended or resumed. Locks and semaphores commonly have small structures (for the most simple ones just one field), and their content is changed with each operation. Atomic instructions allow this to be done thread-safe for multiple threads (without the need for a syscall).

                    Comment


                    • #30
                      Originally posted by indepe View Post
                      I'm not sure what exactly you are referring to, however the "need for contents of structures to change across multi processes/threads", as such, is a common one, and usually accomplished with atomic instructions (such as CMPXCHG), so that syscalls are only needed on the so-called "slow path" when a thread needs to be suspended or resumed. Locks and semaphores commonly have small structures (for the most simple ones just one field), and their content is changed with each operation. Atomic instructions allow this to be done thread-safe for multiple threads (without the need for a syscall).
                      What you are missing in sections of the Windows API/NT API there are sections that are only the slow path. Items like CMPXCHG cannot be used because its multi values in the object/handle structure that need to change but not with values the threads wanted there but with values kernel wanted there. We are dealing with sections of the Windows NT design that predate the 486 and predate CMPXCHG existing and still in active use in new applications for Windows 10. So there are structures in Windows NT design that is in modern day windows that make no sense if you are thinking performance but windows coded applications are expecting the behaviour.

                      The system uses objects and handles to regulate access to system resources for two main reasons.


                      Gets more wacky when you find that some of old stuff have you checking ACL on handles and objects to work out if a program is in fact allowed to take out a lock this kind of stuff you cannot use CMPXCHG or equal to implement correctly either because the ACL value is allowed to change while the program is running so allowed to take out the lock once and the next time you attempt to take lock get permission denied.

                      Its really simple to think hey we have all this modern stuff it can do all the same things then miss you need to duplicate the old behavour before the modern times because that is what applications expect.

                      Comment

                      Working...
                      X