Announcement

Collapse
No announcement yet.

Wine Developers Are Working On A New Linux Kernel Sync API To Succeed ESYNC/FSYNC

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #71
    Originally posted by Weasel View Post
    ???? Permission checks are simple if conditions (cmp instructions probably, or test), they don't even have to be atomic because they do NOT change or do anything in regards to the object's state. They happen before any lock is taken. They do not change during the lifetime of the object anyway; they're set when you create or open the sync object.
    That is wrong. Windows on Events does allow you to change the owner and the access rights on those objects during the life time of the object. Generally anyone doing modern programs avoids this. Problem is some game developers don't and exploit this for performance tweaks. Events are not your normal good rule behaving objects. This is party why wait on multi objects or anything else based on Windows Events can be so problem child.

    Originally posted by Weasel View Post
    xchg existed since 8086, the original x86 CPU.

    It wasn't "special" like now, though, because every instruction was atomic, all memory operations "locked" the memory bus, because there was no caching or multi-threading. In practice, you could have used it for locks just like now, so nothing changed, if multi-threading existed.
    This is also being horrible wrong. 8086 shared in memory over the ISA bus and other equal bus this is where things get horrible, Yes its possible to have a dual 8086 system the did exist in some special main frames. Same fault as the dec alpha in fact there is no protective MMU for this stuff until the 386. Multi core 8086,80186, 80286 systems. Thing to remember the lead developers of Windows NT have come out of VMS as in mainframe wacky hardware not your desktop PC stuff. xchg on a 8086-80286 in your mainframe does not promise atomic safe at all. Heck your industrial control boards xchg does not promise safe either when you put them in a 80286. Yes really fun is some of the ISA industrial control boards you stuck into a real IBM XT in fact contained another 8088 processor. The fun of having two items writing to the same ram. The idea that xchg is same safety going back all the way in x86 history is bogus Weasel.

    xchg to be always safe needs MMU changes so there is hardware level locking around it.

    Originally posted by indepe View Post
    On thinking about it, I want to add something here: The reasons to use locking here, even for functions like SET_EVENT and PULSE_EVENT, even during normal operation, are very related to the reasons why I think this is not an API suited for general use on Linux (or to be added to the kernel where they would still need to address contention, and be an additional larger burden even if not used). Also these APIs would lead programmers to structure their application code in an very inefficient way. Linux should have better and more efficient APIs, if any new ones. Those, however, would (likely) not be directly suited to emulate Windows API.

    Therefore I think that Windows emulation is a use case that should be handled differently than everything else, and as far away from the Linux kernel as possible, so to speak.
    This is really not a good arguement. 16 bit protected mode on 64 bit system x86 that the Linux kernel supports. Only user of that is wine itself. There are quite a few unique wine only parts in the Linux kernel. Wine being a compatibility layer not a emulator as prime focus over time means many different features have been added to the Linux and freebsd kernel just for wine.

    To be correct extra burden if not used does not have to be the case either ntsync proposed by the wine developers can be done as a loadable module. Remember this stuff is you call the kernel by syscall or something then the kernel side processes the locking. Of course ntsync not used it never used and when there are no more users it can be unloaded If used record the extra information need.

    Doing this as a loadable module means you don't have to have read/write memory between processes to allow the permission bending windows events allows as the kernel module can track this and perform the need tasks here in the correct flow path. Also there are existing flags the scheduler can see in processes to deal with real time priority inversion that wine can reuse in wait for multi case so time slice for something waiting on multi or equal is reallocated from the waiting to one of the processes that could trigger the need events. Priority inversion correction is a kernel mode thing.

    Comment


    • #72
      Originally posted by oiaohm View Post
      That is wrong. Windows on Events does allow you to change the owner and the access rights on those objects during the life time of the object. Generally anyone doing modern programs avoids this. Problem is some game developers don't and exploit this for performance tweaks. Events are not your normal good rule behaving objects. This is party why wait on multi objects or anything else based on Windows Events can be so problem child.
      Like how? Show the API that changes it on the fly.

      Mind you, even if it were the case, it wouldn't be a problem: just add a lock for changing permissions (a simple critical section is enough) inside the kernel (so it's cross process). I mean, on Windows obviously. There's literally no complication here.

      However, changing permissions on the fly is just bad design. Because permissions are granted when you created or open the object, and I'm talking generally here. You don't change permissions by doing a write() on a file, that's just dumb.

      Comment


      • #73
        Originally posted by oiaohm View Post
        This is really not a good arguement. 16 bit protected mode on 64 bit system x86 that the Linux kernel supports. Only user of that is wine itself. There are quite a few unique wine only parts in the Linux kernel. Wine being a compatibility layer not a emulator as prime focus over time means many different features have been added to the Linux and freebsd kernel just for wine.
        It is a summary more than an argument.

        As to the rest of your points in so far as they might make any sense at all:

        Originally posted by oiaohm View Post
        To be correct extra burden if not used does not have to be the case either ntsync proposed by the wine developers can be done as a loadable module.
        I don't know enough about loadable modules to tell if that might be something to consider in the far future (once they got a hang of doing that stuff efficiently and without races), but I think to avoid being a big burden on the kernel , and also otherwise, the best way to implement it in a module would be the way I have suggested it, just inside a module. Be prepared to take performance hits with that approach, though. And as far as I could tell, the initial idea was to improve performance, wasn't it?

        EDIT: At this point I'd also like to remind you that Linux itself generally implements locks, semaphores and such things in libraries (pthreads/glibc) in userspace, not in the kernel.

        Originally posted by oiaohm View Post
        Also there are existing flags the scheduler...
        Please don't even think of messing with those things. Please.

        At this point I'd like to concern myself with other things, as I think we covered the subject sufficiently (unless some surprise comes up here).
        Last edited by indepe; 24 January 2021, 05:02 PM.

        Comment


        • #74
          Originally posted by Weasel View Post
          However, changing permissions on the fly is just bad design. Because permissions are granted when you created or open the object, and I'm talking generally here. You don't change permissions by doing a write() on a file, that's just dumb.
          The unlocker tool and different anti-malware tools do exploit windows ability todo just that. Yes this stuff can be dumb and highly dangerous to data integrity the interfaces todo it have been selectively deleted from the Microsoft documentation. I did point one of the functions that is used in pulling it off that is still referenced in the Microsoft documentation but the page is now 404. Lot of the other the functions were documented and are not documented. Yes the fun of having old MSDN discs and comparing to what Microsoft has on line now and seeing the gaps that shows you a lot of api/ABI Microsoft has deleted from general usage.

          Originally posted by Weasel View Post
          Like how? Show the API that changes it on the fly.
          This is hard due to the fact Microsoft has been deleting the documentation/so call deprecating it. This is not stopping those functions being used.

          Originally posted by indepe View Post
          I don't know enough about loadable modules to tell if that might be something to consider in the far future (once they got a hang of doing that stuff efficiently and without races), but I think to avoid being a big burden on the kernel , and also otherwise, the best way to implement it in a module would be the way I have suggested it, just inside a module. Be prepared to take performance hits with that approach, though. And as far as I could tell, the initial idea was to improve performance, wasn't it?
          The wine developers are not after to just improve performance. They are after to fix application failures as well. There are many windows applications like games that are in fact highly performance sensitive to the point wrong performance pattern they crash.

          Originally posted by indepe View Post
          IEDIT: At this point I'd also like to remind you that Linux itself generally implements locks, semaphores and such things in libraries (pthreads/glibc) in userspace, not in the kernel.
          This is historic caused pthread historicly was 100 percent user space. Native POSIX Thread Library sees the usage of futex appear in Linux seeing pthread lock contentation to be handled by the kernel. If you are doing real-time stuff you will normal be using 100% kernel managed RT-Mutex due to needing priority inversion correction.

          https://www.kernel.org/doc/html/late...ex-design.html

          There is also the kernel managed normal Mutex as well in the Linux kernel also exposed to user space.

          So the idea that Linux implements all locking in userspace is bogus there are major forms of locking in the Linux kernel provided to userspace the futex, Mutex and the rt-mutex. The way windows events works is closer to a RT-Mutex than a futex or a Mutex but a RT-Mutex is not quite right either.

          The type of locking Windows events uses that the windows kernel provides there is really nothing currently provided by the Linux kernel that matches or that you can properly emulate using the userspace options.

          There is a reason why I have studied pre atomic instruction locking solutions. Turns out they are useful to know when design locking in a cluster.

          Pre atomic case where two or more cpu cores have access to the same memory without proper protection you need to be aware of. The most recent case of this nightmare turning up was the early Arm big/little cpu combinations as the group of big and the group of little cores where not sharing atomic memory protection stuff yet accessing same memory and process being transferable from little to big hello split brain.

          There is a reason why a lot of real-time code is using kernel based locking like RT-Mutex and Mutex instead of futex there is a lot of hardware that real-time stuff runs on that is needing your pre attomic locking. This results in horrible special code in the Mutex to make sure the cores sharing memory know that you have the lock. This is double set locking.

          xchg to set a section of memory you seek the lock then waiting for the other cores to answer you have the lock in there section of memory before you have the lock horrible this kind of pre attomic lock is not fast this is why processing permissions and the like on the lock take out is not a big problem because getting the lock is going to be slow. Fun of spinlock to get a system wide lock here. The design of windows events makes sense on hardware where you are missing the atomic safe memory operations.

          Mutli core 8088-8086-80186-80286 yes two chips for 2 threads(can in fact be more cpu and more threads) at a time sharing the same memory storage does not have atomic safe memory operations. 386 in the x86 line is where you see MMU decanted 100 percent cpu chip so able to 100 percent promise atomic memory operations. Alpha chips that VMS was based on that the core Windows NT developers come from does not have atomic safe memory operations.

          Yes hardware still turns up today that does not have atomic safe memory operations and Linux kernel supports some of this hardware of course the userspace for on this hardware is will be using Mutex and rt-mutex as in kernel space locking over userspace locking backed by futex.

          There is a big problem with using the world generally that allows you to ignore the non general case. Linux kernel has a non general case for real-time and horrible hardware of kernel based locking. NT event system is designed for horrible hardware that is very cluster like.

          Some ways it would be good if the Linux kernel could develop a permission supporting locking system in kernel for clusters yes this is another case where you are wanting to perform a lock across systems without proper atomic based memory protections to perform the lock with taking out the lock being more costly than doing permission processing.

          Comment


          • #75
            Originally posted by oiaohm View Post
            The unlocker tool and different anti-malware tools do exploit windows ability todo just that.
            They don't because they'd break the app instantly unless they elevate the permissions (but to what? it won't do a thing if the app already doesn't need more perms than it asked for). Computer code is not magic to me like it is for you.

            Originally posted by oiaohm View Post
            This is hard due to the fact Microsoft has been deleting the documentation/so call deprecating it. This is not stopping those functions being used.
            Like what function? Name it or link it.

            Oh, I don't care about Microsoft documentation. You don't have to link official MS docs. Anything works. They don't document many internal Nt* APIs either, but there's loads of sites who do, you'll have no problem finding it. (worse comes to worse, just use web archive).

            Or just tell me the name of the API, and I'll find it if it exists in the real world and not just your imagination.

            Comment


            • #76
              Originally posted by Weasel View Post
              Oh, I don't care about Microsoft documentation. You don't have to link official MS docs. Anything works. They don't document many internal Nt* APIs either, but there's loads of sites who do, you'll have no problem finding it. (worse comes to worse, just use web archive).
              The problem here is the intentionally deleted parts of the MSDN you don't find in web archive.

              https://help.archive.org/hc/en-us/ar...ayback-Machine
              How can I exclude or remove my site's pages from the Wayback Machine?

              You can send an email request for us to review to i[email protected] with the URL (web address) in the text of your message.
              Yes there is a process you can perform to pull pages from the archive and Microsoft has for some of these things.

              Where I know of the information is in old MSDN in the Pulse event with the example how to do it in functions for cases where you need todo it without permission as in what prillages you need to grab to perform it anyhow. Sorry to say Microsoft has progressively documented less and in the process of documenting less they have been quite efficient in the removal including going to the effort to request removal.

              https://docs.microsoft.com/en-us/win...l-access-right
              To alter the permissions on event object on your normal objects permission alteration so is your normal sacl access functions in Windows NT.

              Standard access rights to items under Windows NT is not set in stone when the application starts or opens the object.

              https://docs.microsoft.com/en-us/win...-access-rights
              WRITE_DAC (0x00040000L) Required to modify the DACL in the security descriptor for the object.

              Good question weasel why is there a permission to change DACL(the permissions) on event if all processes access that event state was set in stone when the event was created.

              There are a stack of fragments all over the MSDN documentation showing the possibility that you can change the permissions on events on the fly. Every example of that functionality that was in the MSDN has been systematically removed including from the wayback machine.

              If you have the Microsoft documentation for NT 3.1 you can find in it a nice example where you use the ACL with events to use a single event object to selectively message different processes based on the permissions. As in if X user does not have permission to sync with the object the program running as X user cannot see that the event has been set by since Y user does have permission the program running as Y user sees the event then changing that on the fly with a stack of notes about the different traps like changing the ACL could trigger application termination if done particular ways.

              To be correct the functions to perform the change on the fly are documented still the example how to-do has been removed that show that the functionally absolutely exists. Yes Microsoft has been deprecating and removing those examples from the documentation so the functionality/function is not used as much.

              I have pointed you to fragments that remain in the Microsoft documentation enough times. Yet you are still asking for more Weasel. As in you want absolute example code of it that has been removed it also something that if I give you a coded example that is wrong it can crash your complete system.

              When Microsoft puts a lot of effort into deprecating documentation including going to the effort to remove it from the wayback machine this is a area that there be dragons.

              Comment


              • #77
                Rework the Linux kernel to implement deprecated Windows API using 286 technology... what could go wrong?

                Comment


                • #78
                  Originally posted by indepe View Post
                  Rework the Linux kernel to implement deprecated Windows API using 286 technology... what could go wrong?
                  Interesting point is how often pre atomic locking comes critical. Do note its says event. As in you send events between the different parts.

                  There are many different embedded systems where multi cpu cores share the same memory without proper atomics. This is why RT-Mutex is in the kernel because it can use windows nt event style locking between cores when required by the hardware. Pre atomic instruction locking in a lot of cases is syscall based locking.

                  Event based locking between cores is slow its not atomic you request a lock and have to wait until the other cores respond that you have the lock before you let process go with the lock. This style of locking being slow is why it has functional permission system on top of it.

                  This is a different style of lock to what the Linux kernel has.

                  Microsoft has deprecated the usage of the ABI that way by taking away the documentation examples on how to do it. But the ABI todo it is still in windows and different applications use it.

                  You were saying atomic instructions could be used to-do this. Event based locking that is pre atomic locking does not map to atomic instructions. Event based locking exists on different platforms linux runs on that use RT-Mutex syscall.

                  indepe just because something is old does not mean it does not have a particular usage cases.

                  Really you started off with the incorrect presume that all Linux locking only syscalls when there is contention that is not true for kernel backed Mutexs they are what you have to use on some pain in the but platforms where you cannot do atomic memory operations so you can full back to pre atomic memory locking methods.

                  Yes 286 era locking is in the Linux kernel in places because there are still modern day platforms that are just as bad as back then for how you have to-do locking.

                  Comment


                  • #79
                    Originally posted by oiaohm View Post

                    Interesting point is how often pre atomic locking comes critical. Do note its says event. As in you send events between the different parts.

                    There are many different embedded systems where multi cpu cores share the same memory without proper atomics. This is why RT-Mutex is in the kernel because it can use windows nt event style locking between cores when required by the hardware. Pre atomic instruction locking in a lot of cases is syscall based locking.

                    Event based locking between cores is slow its not atomic you request a lock and have to wait until the other cores respond that you have the lock before you let process go with the lock. This style of locking being slow is why it has functional permission system on top of it.

                    This is a different style of lock to what the Linux kernel has.

                    Microsoft has deprecated the usage of the ABI that way by taking away the documentation examples on how to do it. But the ABI todo it is still in windows and different applications use it.

                    You were saying atomic instructions could be used to-do this. Event based locking that is pre atomic locking does not map to atomic instructions. Event based locking exists on different platforms linux runs on that use RT-Mutex syscall.

                    indepe just because something is old does not mean it does not have a particular usage cases.

                    Really you started off with the incorrect presume that all Linux locking only syscalls when there is contention that is not true for kernel backed Mutexs they are what you have to use on some pain in the but platforms where you cannot do atomic memory operations so you can full back to pre atomic memory locking methods.

                    Yes 286 era locking is in the Linux kernel in places because there are still modern day platforms that are just as bad as back then for how you have to-do locking.
                    It appears you are confusing a few things.

                    https://www.kernel.org/doc/Documentation/pi-futex.txt
                    https://www.kernel.org/doc/Documenta...g/rt-mutex.rst
                    http://people.redhat.com/mingo/light...tex-base.patch

                    PI-enabled pthread_mutexes use cmpxchg, an atomic operation, to provide an optimized fastpath (as long as cmpxchg is available on the architecture).

                    RT-mutexes are, as far as I can tell, an internal "kernel-based synchronization object", which is used on the slowpath only (see first link under "Implementation").

                    RT_mutexes are hidden within the FUTEX_LOCK_PI and FUTEX_UNLOCK_PI syscalls, which are called only on the slow path:

                    As mentioned before, the userspace fastpath of PI-enabled pthread mutexes involves no kernel work at all - they behave quite similarly to normal futex-based locks: a 0 value means unlocked, and a value==TID means locked. (This is the same method as used by list-based robust futexes.) Userspace uses atomic ops to lock/unlock these mutexes without entering the kernel.

                    To handle the slowpath, we have added two new futex ops:

                    - FUTEX_LOCK_PI
                    - FUTEX_UNLOCK_PI

                    If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to TID fails], then FUTEX_LOCK_PI is called.
                    Remember I wrote that the information which thread is holding a lock, can be maintained in user space? That is what is done here.

                    I'm also wondering if you are confusing *lockless* methods (which also use atomic operations) with the optimization methods for *locks* that is a userspace fastpath with atomic operations, which results in syscalls only needed when theres contention.

                    Comment


                    • #80
                      Originally posted by indepe View Post
                      PI-enabled pthread_mutexes use cmpxchg, an atomic operation, to provide an optimized fastpath (as long as cmpxchg is available on the architecture).
                      Stop right there. As long as functional cmpxchg is available on the architecture. There are quite a few architectures where cmpxchg is there and its not 100% functional.

                      Next is not all real-time code runtime in fact use pthread implementations on Linux this is due to what is broken.

                      Originally posted by indepe View Post
                      RT_mutexes are hidden within the FUTEX_LOCK_PI and FUTEX_UNLOCK_PI syscalls, which are called only on the slow path.
                      << Not pthread implementation some implementations you will only use the slow form when calling a real-time Mutex. It is in fact important particularly when on the intentionally broken platforms.

                      When I say a broken platform I mean like the platforms where you will have like 4 different arm core clusters sharing the same memory controller but not have the clusters sync for cmpxchg. So inside each cluster cmpxchg works but when you have a thread in 2 different clusters it does not work any more. This is the real time fun where you will decate cores clusters to particular problems.

                      So yes on these broken mess you need to be call a futex that will use cmpxchg when you are doing locking inside a cluster. You also need to have non cmpxchg using locking for when you doing locking across the clusters.

                      Texas Instruments makes a lot of arm chips with multi clusters without synced MMU locking for real time usage. Yes fun right that cmpxchg might only have a lock on 4 cores of a 20 core chip because its 5 individual clusters sharing the same MMU and there is no atomics in the MMU the cmpxchg has only got as far as the L3 cache in the 4 core cluster.

                      So on those horrible platforms you have your Mutex kernel mode, RT Mutex that is kernel mode both of these in you librares are going to be go the slow path every single time. The kernel is built that cmpxchg is not available in these cases even that the instruction is there just because its not 100 percent functional so you need to use non cmpxchg methods so you have locking across all cores and cmpxchg on the odd ball chips is only a subset of cores.

                      Working with modern day realtime bastard chips

                      The old locking is still need in particular usage cases its not your general cases.

                      Yes you have also presumed incorrectly that if a platform has cmpxchg that you will be in fact using it. Yes you can intentionally build glibc on a architecture supporting cmpxchg not to use it so fall back to slow path on every single lock only yes this feature of glibc makes sense once you aware of the horrible chips used in real-time that have fragmented cmpxchg implementation(you can detect cmpxchg existence in the cpu but its not a global atomic across all cores.)

                      The broken platforms where you will be using the slow path every single time in particular usage cases having a permission system on the lock I I can say 2 clusters out of 5 in SOC can take out the lock would be useful.

                      NT design makes sense on platforms where you are forced to always use the slow path. Platforms(SOCs) where you are forced to always use the slow path for locking are still being made new.

                      Comment

                      Working...
                      X