Wine Developers Are Working On A New Linux Kernel Sync API To Succeed ESYNC/FSYNC

oiaohm replied

01 February 2021, 05:43 AM
Originally posted by indepe View Post

Permissions are not mentioned under point 3, or in that list. The problems with NtPulseEvent are specific to the eventfd implementation. As I posted above, there should be no problem implementing NtPulseEvent without races.

Except its not that simple pre attomic style locking welcome to fun.

http://www.codewarrior.cn/ntdoc/winnt/ex/NtPulseEvent.htm

Code:

This function sets an event object to a Signaled state, attempts to satisfy as many waits as possible, and then resets the state of the event object to Not-Signaled

Read that carefully and notice something. Where does it say no race condition and no starting multi-able of the waiting at the same time. There are reasons why Windows developers say that NTPulseEvent is pure evil. It is really simple with NTPulseEvent to cause a race condition and horrible it part of the design . You CPU has 16 threads able to be used a NTPulseEvent is called and there are 16 items waiting on the event all 16 could now be activated at the same time.

Now this is where wait on multi-able objects come in. So you event is wait on one NTPulseEvent and one Exclusive one at a time event this should only get cpu time is both are meet. So you could have a 16 thread system with 16 waiting on the PulseEvent and only 1 start because wait on multi-able they were all sharing the same Exclusive one at a time event. Of course the 16 events on the PulseEvent que could have not confliting event conditions by the wait on multi-able they all can start at once.

This is a completely different beast to what you are thinking of you are needing to solve more than 1 event state at the same time. You have a completely different type of conflicted lock here.

Fun a item might require two different PulseEvent triggered Events to be active at the same time. The idea of being 100 percent race condition free is a new idea.
Leave a comment:
indepe replied

01 February 2021, 02:00 AM
Originally posted by oiaohm View Post

3. WHY IT CAN'T BE DONE WITH EXISTING TOOLS

Kernel interface for Wine synchronization primitives

https://lore.kernel.org/lkml/[email protected]/T/#u

This lists what breaks before you implement permissions. NtPulseEvent there is nothing like it in atomic system. Linux userspace implementing Windows API does run into hard wall of stuff you cannot do with locks using memory atomic operations.

Permissions are not mentioned under point 3, or in that list. The problems with NtPulseEvent are specific to the eventfd implementation. As I posted above, there should be no problem implementing NtPulseEvent without races. My guess would be that these problems are apparently due to side effects outside the locked region caused by the use of eventfd with split set and reset operations.

Simple access flags (wait/modify/read) are mentioned under point 2, but just describe which handle can be used for which operations. Difficulties implementing that in userspace are perhaps due to the specifics of their eventfd implementation, and perhaps that futex implementation is not different enough. Also this seems to have been solved already on a higher level, since the access flags are apparently not part of the proposed API. Either way, that seems a very easy thing to do in general. I have no idea why anyone would think a syscall would even help with that. Just a lock should be fine.

It seems you are just referring to that email. As you can tell, I'm not taking your word for it, or anyone else's, regarding the claim that things are hard or impossible.

Originally posted by oiaohm View Post

The reality is there are a complete type of locking primitive missing.

What specifically are you referring to? I can't make any sense of this claim.

Originally posted by oiaohm View Post

The reality is wine project has tried all different things to make event system of windows work using Linux provided parts so far its never worked right. Please note they drop doing the advanced forms because basic forms cannot be done. Yes Wine tells all programs that its running as administrator with high rights to stomp over everything so can ignore some of the permission stuff.

The reality here indepe you have not attempted to implement Windows Event system on Linux provided parts yet saying it has to be possible. Developers of wine have tried for over 20 years. There is the repeated points were wine developers get so far and then the fact you don't have the permission system and you don't it kernel side processed all the time comes back and cause developer have to comprise. That comprise results in programs not working or behaving strangely.

You can easily have 20 or 40 years of experience without ever seeing the inside of a lock. There is nothing that would tell me that experts in the implementation of synchronization primitives are involved in this proposal.
Likes 1
Leave a comment:
oiaohm replied

01 February 2021, 12:20 AM
Originally posted by indepe View Post

Again, you have to distinguish Windows userspace and Linux userspace. Any difficulties and impossibilities a WIndows application has in Widows userspace, do not apply to Linux userspace implementing the WIndows API and/or kernel call. And you are not providing any specific technical reasons why it wouldn't be possible to *implement* event permissions in userspace.

3. WHY IT CAN'T BE DONE WITH EXISTING TOOLS

Kernel interface for Wine synchronization primitives

https://lore.kernel.org/lkml/[email protected]/T/#u

This lists what breaks before you implement permissions. NtPulseEvent there is nothing like it in atomic system. Linux userspace implementing Windows API does run into hard wall of stuff you cannot do with locks using memory atomic operations.

The reality is there are a complete type of locking primitive missing. The reality is wine project has tried all different things to make event system of windows work using Linux provided parts so far its never worked right. Please note they drop doing the advanced forms because basic forms cannot be done. Yes Wine tells all programs that its running as administrator with high rights to stomp over everything so can ignore some of the permission stuff.

The reality here indepe you have not attempted to implement Windows Event system on Linux provided parts yet saying it has to be possible. Developers of wine have tried for over 20 years. There is the repeated points were wine developers get so far and then the fact you don't have the permission system and you don't it kernel side processed all the time comes back and cause developer have to comprise. That comprise results in programs not working or behaving strangely.
Leave a comment:
indepe replied

31 January 2021, 09:40 PM
Originally posted by oiaohm View Post

That is close but its not quite it.

Windows has locking that has permission processing on it. rt-mutex does not have permission processing on it. Do it with atomic operations in userspace comes next to impossible.

So the permission system is just on top of it and doesn't change the locking method or the lock type.

The question how that can be implemented is a different one, and not part of the proposal that we are discussing. It probably will simply ignore event permissions. It isn't mentioned as part of the proposed API.

Again, you have to distinguish Windows userspace and Linux userspace. Any difficulties and impossibilities a WIndows application has in Widows userspace, do not apply to Linux userspace implementing the WIndows API and/or kernel call. And you are not providing any specific technical reasons why it wouldn't be possible to *implement* event permissions in userspace. I don't think that would have to do much with atomic operations, it would just be normal code running inside a normal lock. But that is beside the topic of this discussion anyway.

Originally posted by oiaohm View Post

WaitForMultipleObjects in windows is event based. Locking based on event system on windows is in fact permission processing as part of getting the lock and being allowed to keep the lock. As in if your process is not the right user/permission you will notified of the event. You fastpath userspace optimizations are not designed for having secure process permissions on the lock if you have the lock or not. Fast path is not designed to deal with the case that applications permission to have a lock has been revoked so the lock has to be taken back by force.

This argument may apply to Windows userspace, however it doesn't apply to Linux userspace (see my explanation of that distinction in previous posts).

[snip]

Originally posted by oiaohm View Post

indepe there is a real lack in the Linux mainline kernel include locking methods for the problem cases where you cannot use fastpath userspace optimizations. Case of this problem is you SOC arm with multi cpu clusters with shared memory but no atomic protections this performance thing that you can gain some ram transfer speed by not processing protection in the MMU instead requiring software coded for the hardware. Other place is in fact is your general computer clustering with x86 you see a lot of userspace cluster software that is really working very hard to implement cluster wide locking with permission processing in userspace runs into the same overhead problems wineserver is running into.

If you really want WINE to implement these features, and to implement them even on these architectures, then special case them for example in the way described in the Linux documentation that I quoted. However I very much doubt that will ever happen. It sounds like a recipe for disaster.

Originally posted by oiaohm View Post

Basically there is a complete class of locking primitive the Linux kernel is missing. Its a really old class of locking primitive that was generally used on hardware Linux never supported. But that really old class of locking has it use cases.

This doesn't make much sense to me, if it is supposed to relate in any way to our discussion. It sounds like you read something without really understanding the context, and how that context is different to the one in this discussion. And don't believe everything you read on the internet.
Leave a comment:
oiaohm replied

30 January 2021, 11:20 PM
Originally posted by indepe View Post

What does that have to do with the ability to implement any kind of locking on modern x86 CPUs? Which is what you claimed, isn't it? That Windows has some kind of locking where the fastpath cannot be implemented in userspace with atomic operations?

So I have just shown you that PI style locking, which rt-mutex is used for, does have fastpath userspace optimizations on modern x86 CPUs.
If you think there is some other kind of locking, name it !

That is close but its not quite it.

Windows has locking that has permission processing on it. rt-mutex does not have permission processing on it. Do it with atomic operations in userspace comes next to impossible.

WaitForMultipleObjects in windows is event based. Locking based on event system on windows is in fact permission processing as part of getting the lock and being allowed to keep the lock. As in if your process is not the right user/permission you will notified of the event. You fastpath userspace optimizations are not designed for having secure process permissions on the lock if you have the lock or not. Fast path is not designed to deal with the case that applications permission to have a lock has been revoked so the lock has to be taken back by force.

All this event based stuff in Windows does not have any fastpath code because that does not work due to this being a permission and lock thing yes this is a really old locking system you find in many systems before atomic instructions. Under wine this results in horrible performance over head as you are basically having to take out a lock to wineserver to have wineserver process if you should get the lock or not.

indepe like it or not there is a limitation to what you can do with atomic memory instructions for fast path locking Atomic memory instructions for locking were not designed for the usage case where you have to check permissions to decide if you get the lock or not or if change of permission results in lock being revoked from you.

There are a few pre atomic locking designs that cannot be done with atomic memory instructions they way Windows event system is designed it is one of them. Please note windows does not have just one of these cases of items that cannot be fastpath solved.

Also this kind of locking issue historically has turned up on Linux x86. Not single system problem but the cluster system problem issue MOSIX and others run into MOSIX extended the kernel adding extra locking that supported permissions and also had no fastpath.

indepe there is a real lack in the Linux mainline kernel include locking methods for the problem cases where you cannot use fastpath userspace optimizations. Case of this problem is you SOC arm with multi cpu clusters with shared memory but no atomic protections this performance thing that you can gain some ram transfer speed by not processing protection in the MMU instead requiring software coded for the hardware. Other place is in fact is your general computer clustering with x86 you see a lot of userspace cluster software that is really working very hard to implement cluster wide locking with permission processing in userspace runs into the same overhead problems wineserver is running into.

Basically there is a complete class of locking primitive the Linux kernel is missing. Its a really old class of locking primitive that was generally used on hardware Linux never supported. But that really old class of locking has it use cases.
Leave a comment:
Weasel replied

30 January 2021, 10:50 AM
Originally posted by oiaohm View Post

Stop right there. As long as functional cmpxchg is available on the architecture. There are quite a few architectures where cmpxchg is there and its not 100% functional.

LMFAO.
Leave a comment:
indepe replied

30 January 2021, 04:51 AM
Originally posted by oiaohm View Post

Stop right there. As long as functional cmpxchg is available on the architecture. There are quite a few architectures where cmpxchg is there and its not 100% functional.

Next is not all real-time code runtime in fact use pthread implementations on Linux this is due to what is broken.

What does that have to do with the ability to implement any kind of locking on modern x86 CPUs? Which is what you claimed, isn't it? That Windows has some kind of locking where the fastpath cannot be implemented in userspace with atomic operations?

So I have just shown you that PI style locking, which rt-mutex is used for, does have fastpath userspace optimizations on modern x86 CPUs.
If you think there is some other kind of locking, name it !

Why would anyone "stop right there"? Surely the largest part, or at least a very significant part of WINE users, want to run modern Windows applications written and compiled for modern x86 CPUs, on these same CPUs. And that's where you can use a userspace fastpath. And not only there.

So on some peculiar other architectures worst case maybe you have to make more syscalls, I don't know those architectures. However that doesn't change the situation on modern x86. Just encapsulate that in the high-level operations like lock/unlock. Also, some architectures will be able to run an x86 emulator with modern x86 instructions, so they will need some functional equivalent to emulate each atomic CPU instruction, given that these are now used all over the place within applications, including Windows applications.

Show us some application code using Windows API, that you think cannot be optimized in userspace (on modern x86). Otherwise your talk doesn't mean anything.

Last edited by indepe; 30 January 2021, 04:53 AM.
Likes 1
Leave a comment:
oiaohm replied

30 January 2021, 02:18 AM
Originally posted by indepe View Post

PI-enabled pthread_mutexes use cmpxchg, an atomic operation, to provide an optimized fastpath (as long as cmpxchg is available on the architecture).

Stop right there. As long as functional cmpxchg is available on the architecture. There are quite a few architectures where cmpxchg is there and its not 100% functional.

Next is not all real-time code runtime in fact use pthread implementations on Linux this is due to what is broken.

Originally posted by indepe View Post

RT_mutexes are hidden within the FUTEX_LOCK_PI and FUTEX_UNLOCK_PI syscalls, which are called only on the slow path.

<< Not pthread implementation some implementations you will only use the slow form when calling a real-time Mutex. It is in fact important particularly when on the intentionally broken platforms.

When I say a broken platform I mean like the platforms where you will have like 4 different arm core clusters sharing the same memory controller but not have the clusters sync for cmpxchg. So inside each cluster cmpxchg works but when you have a thread in 2 different clusters it does not work any more. This is the real time fun where you will decate cores clusters to particular problems.

So yes on these broken mess you need to be call a futex that will use cmpxchg when you are doing locking inside a cluster. You also need to have non cmpxchg using locking for when you doing locking across the clusters.

Texas Instruments makes a lot of arm chips with multi clusters without synced MMU locking for real time usage. Yes fun right that cmpxchg might only have a lock on 4 cores of a 20 core chip because its 5 individual clusters sharing the same MMU and there is no atomics in the MMU the cmpxchg has only got as far as the L3 cache in the 4 core cluster.

So on those horrible platforms you have your Mutex kernel mode, RT Mutex that is kernel mode both of these in you librares are going to be go the slow path every single time. The kernel is built that cmpxchg is not available in these cases even that the instruction is there just because its not 100 percent functional so you need to use non cmpxchg methods so you have locking across all cores and cmpxchg on the odd ball chips is only a subset of cores.

Working with modern day realtime bastard chips

The old locking is still need in particular usage cases its not your general cases.

Yes you have also presumed incorrectly that if a platform has cmpxchg that you will be in fact using it. Yes you can intentionally build glibc on a architecture supporting cmpxchg not to use it so fall back to slow path on every single lock only yes this feature of glibc makes sense once you aware of the horrible chips used in real-time that have fragmented cmpxchg implementation(you can detect cmpxchg existence in the cpu but its not a global atomic across all cores.)

The broken platforms where you will be using the slow path every single time in particular usage cases having a permission system on the lock I I can say 2 clusters out of 5 in SOC can take out the lock would be useful.

NT design makes sense on platforms where you are forced to always use the slow path. Platforms(SOCs) where you are forced to always use the slow path for locking are still being made new.
Leave a comment:
indepe replied

29 January 2021, 11:37 PM
Originally posted by oiaohm View Post

Interesting point is how often pre atomic locking comes critical. Do note its says event. As in you send events between the different parts.

There are many different embedded systems where multi cpu cores share the same memory without proper atomics. This is why RT-Mutex is in the kernel because it can use windows nt event style locking between cores when required by the hardware. Pre atomic instruction locking in a lot of cases is syscall based locking.

Event based locking between cores is slow its not atomic you request a lock and have to wait until the other cores respond that you have the lock before you let process go with the lock. This style of locking being slow is why it has functional permission system on top of it.

This is a different style of lock to what the Linux kernel has.

Microsoft has deprecated the usage of the ABI that way by taking away the documentation examples on how to do it. But the ABI todo it is still in windows and different applications use it.

You were saying atomic instructions could be used to-do this. Event based locking that is pre atomic locking does not map to atomic instructions. Event based locking exists on different platforms linux runs on that use RT-Mutex syscall.

indepe just because something is old does not mean it does not have a particular usage cases.

Really you started off with the incorrect presume that all Linux locking only syscalls when there is contention that is not true for kernel backed Mutexs they are what you have to use on some pain in the but platforms where you cannot do atomic memory operations so you can full back to pre atomic memory locking methods.

Yes 286 era locking is in the Linux kernel in places because there are still modern day platforms that are just as bad as back then for how you have to-do locking.

It appears you are confusing a few things.

https://www.kernel.org/doc/Documentation/pi-futex.txt

https://www.kernel.org/doc/Documenta...g/rt-mutex.rst
http://people.redhat.com/mingo/light...tex-base.patch

PI-enabled pthread_mutexes use cmpxchg, an atomic operation, to provide an optimized fastpath (as long as cmpxchg is available on the architecture).

RT-mutexes are, as far as I can tell, an internal "kernel-based synchronization object", which is used on the slowpath only (see first link under "Implementation").

RT_mutexes are hidden within the FUTEX_LOCK_PI and FUTEX_UNLOCK_PI syscalls, which are called only on the slow path:

As mentioned before, the userspace fastpath of PI-enabled pthread mutexes involves no kernel work at all - they behave quite similarly to normal futex-based locks: a 0 value means unlocked, and a value==TID means locked. (This is the same method as used by list-based robust futexes.) Userspace uses atomic ops to lock/unlock these mutexes without entering the kernel.

To handle the slowpath, we have added two new futex ops:

- FUTEX_LOCK_PI
- FUTEX_UNLOCK_PI

If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to TID fails], then FUTEX_LOCK_PI is called.

Remember I wrote that the information which thread is holding a lock, can be maintained in user space? That is what is done here.

I'm also wondering if you are confusing *lockless* methods (which also use atomic operations) with the optimization methods for *locks* that is a userspace fastpath with atomic operations, which results in syscalls only needed when theres contention.
Likes 1
Leave a comment:
oiaohm replied

29 January 2021, 07:02 PM
Originally posted by indepe View Post

Rework the Linux kernel to implement deprecated Windows API using 286 technology... what could go wrong?

Interesting point is how often pre atomic locking comes critical. Do note its says event. As in you send events between the different parts.

There are many different embedded systems where multi cpu cores share the same memory without proper atomics. This is why RT-Mutex is in the kernel because it can use windows nt event style locking between cores when required by the hardware. Pre atomic instruction locking in a lot of cases is syscall based locking.

Event based locking between cores is slow its not atomic you request a lock and have to wait until the other cores respond that you have the lock before you let process go with the lock. This style of locking being slow is why it has functional permission system on top of it.

This is a different style of lock to what the Linux kernel has.

Microsoft has deprecated the usage of the ABI that way by taking away the documentation examples on how to do it. But the ABI todo it is still in windows and different applications use it.

You were saying atomic instructions could be used to-do this. Event based locking that is pre atomic locking does not map to atomic instructions. Event based locking exists on different platforms linux runs on that use RT-Mutex syscall.

indepe just because something is old does not mean it does not have a particular usage cases.

Really you started off with the incorrect presume that all Linux locking only syscalls when there is contention that is not true for kernel backed Mutexs they are what you have to use on some pain in the but platforms where you cannot do atomic memory operations so you can full back to pre atomic memory locking methods.

Yes 286 era locking is in the Linux kernel in places because there are still modern day platforms that are just as bad as back then for how you have to-do locking.
Leave a comment:

Previous 1 2 3 4 5 template Next

Announcement

Wine Developers Are Working On A New Linux Kernel Sync API To Succeed ESYNC/FSYNC

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: