Originally posted by oiaohm
View Post
Announcement
Collapse
No announcement yet.
Wine Developers Are Working On A New Linux Kernel Sync API To Succeed ESYNC/FSYNC
Collapse
X
-
Originally posted by Weasel View PostNo, futex solves ONLY the wait-multiple problem compared to esync, but nothing else listed.
I am talking about what is possible with the existing futex kernel API and atomic operations, not about some existing specific implementation like fsync.
Most of those problems in that list explicitly referred to "eventfd", which I am not talking about either.
So I don't know what principle problem you think there would be in implementing the necessary functions using the exisiting kernel API.
Or why you would think that.
Originally posted by Weasel View Post"usually" doesn't mean "always". If you don't understand the problem that contention is the reason for the performance loss then feel free to live in your imaginary world.
Some games work well with esync or fsync, but not every game. Let's say 99% of games work well. That's your "usually". But we don't care about those games in this thread; this isn't about them. We care about the other 1%. The games that use too many threads and have too much contention don't. And that's the whole point of this thing. Going from 100 fps to 80 fps for example.
We're not talking about the "average" here. We're talking about this specific situation where certain games suffer from it, due to whatever design they have. Call it wrong, call it buggy, I don't care. The games are built that way and there's nothing you or Wine can do about it, other than emulate Windows better (since they run better on Windows), which is one atomic syscall for these operations.
However, I am not talking about fsync at all.
Comment
-
Originally posted by Weasel View PostDamn dude I didn't realize you were still using a 286 with Windows 10. Can you sell me your magic?
Comment
-
Originally posted by Weasel View PostThe games are built that way and there's nothing you or Wine can do about it, other than emulate Windows better (since they run better on Windows), which is one atomic syscall for these operations.
That isn't true in two ways:
a) You don't need a syscall to make multiple operations atomic. For example a spinlock that doesn't use syscalls at all (not even under contention), and will do just fine. That's not a recommendation for spinlocks, though.
b) Just using a syscall doesn't make things atomic. You still need a lock inside the syscall, or other atomic operations.
Comment
-
Originally posted by indepe View PostJust noticed there is again an indication that you seem to suggest that syscalls are needed to make things atomic. As was suggested before when you wrote: "Sometimes you need an operation to be atomic to avoid race conditions."
That isn't true in two ways:
a) You don't need a syscall to make multiple operations atomic. For example a spinlock that doesn't use syscalls at all (not even under contention), and will do just fine. That's not a recommendation for spinlocks, though.
b) Just using a syscall doesn't make things atomic. You still need a lock inside the syscall, or other atomic operations.
The syscalls need to do everything because, if you only use the syscalls themselves to signal whatever sync operations you need, it must "finish" before userspace gets a chance to get scheduled. The kernel can control this scheduling, but userspace can't. I'll give a practical example below.
Ok so, I doubt what you mean can actually work, but I'm open to be proven wrong and always like to learn new stuff, maybe you have some genius idea. So let's just look at one example where the current esync/fsync patches can't handle properly: PulseEvent.
Basically what it does is, it wakes up any threads waiting for the event, without changing the event's state. Sound simple, right? (please don't start about it being a "badly designed API". I'm fully aware about it, and by NO means do I advocate using it! unfortunately, existing applications and games DO use it, that's why it must be implemented)
One way you can badly emulate it is like setting the event (which wakes up a waiting thread) followed up by resetting the event. Obviously, this is a race condition, because you have TWO syscalls here, and there's no guarantee that your "reset event" state happens before any other thread uses the event, even though it should. In fact, your thread could get completely halted right after the first syscall for ages.
So, you can protect it with a lock, right?
Code:PulseEvent() { acquire_lock(); // <-- spinlock to avoid another syscall set_event_and_wake_thread(); reset_event(); release_lock(); }
If the whole thing was just ONE syscall (something that fully implements PulseEvent), none of this would be an issue. No thread would ever wait, there would be at most one syscall, etc.
So I fail to understand what you mean. Can you show a simple pseudo-code example for PulseEvent, just so I can see what you're trying to say better?
Again, I understand you can do almost any (or even any) synchronization with just futexes or spinlocks, but that's only when you control the design of the application yourself, this isn't such a case.
Comment
-
Originally posted by Weasel View PostDamn dude I didn't realize you were still using a 286 with Windows 10. Can you sell me your magic?
Originally posted by indepe View PostObviously it is the magic of "pre atomic methods", which "atomic methods cannot be used to emulate". Which is why it would be no problem to run WINE on DEC Alpha. It has that magic.
Pre atomic methods for doing locking can be used on a CPU that support atomic methods but atomic methods cannot be used to replace them in all cases.
Yes pre atomic CPU have their fair share of fun emulating atomic methods with lots of odd ball Corner cases.
This is one of these things if it not broken don't fix it path. Using pre atomic locking methods work perfectly as fine as what they did on pre atomic locking on atomic locking supporting cpus with the same level performance problems.
The hard case is your pre attomic locks that are mug-able locks. This where a process take out a lock and another process wants the lock so it now uses information about the lock to kill the process holding the lock so it can take the lock or free the lock. How do you safely record what process has the lock without a syscall.
https://www.ryadel.com/en/unlock-fil...ocess-windows/
The unlocker tool here exploit the muggable locks.
You modern atomic locking is design with acquire and release most of your modern course on locking teach only these two. When you go back to pre-attomic locking you have a extra called Take. Take is your process will get the lock no matter what even if another process has the lock currently.
Pre atomic locking model:
Take acquire lock by force included killing who has the lock if required this also will mean you will be wanting to check ACL and other security things to see if a process is allowed to brute force it way on a lock.
Acquire wait until lock can be got.
Release let go of lock.
This pre atomic locking model is inside Windows in different areas. Atomic locking was designed to be nicely high performing but was not design to be brute forced with take lock methods.
There are even subsets inside take locks methods.
Like you take a lock by force from a process and the process is not killed just suspended until the higher privilege process releases the lock yes there are a few places inside windows where you can totally valid do this. Like anti-virus scanning a locked file.
indepe this is a horrible one to consider you pre atomic locking support this. How are you going todo a take lock operation with atomic locking that does not kill the process that holding the lock when a higher privilege process takes the lock for its own usage that is meant to be returned to the lower privilege process latter without the lower privilege process not to know that the lock was pick pocketed from it and put back.
This is the problem with the idea that all locking is atomic locking is wrong people doing computer courses are taught that and that is not true when you get into the tricker sections of Windows. Because Windows contains pre attomic locking in places and pre attomic locking allows the horrible take a lock by force with different levels slide of hand.
Comment
-
Originally posted by Weasel View PostYeah of course you don't need syscalls to make things atomic. I actually love atomic instructions that can be used for simple locks in userspace. I'm a fan of userspace synchronization, but unfortunately that can work reliably only on apps that I actually develop, not 3rd party (like the Windows games).
The syscalls need to do everything because, if you only use the syscalls themselves to signal whatever sync operations you need, it must "finish" before userspace gets a chance to get scheduled. The kernel can control this scheduling, but userspace can't. I'll give a practical example below.
Ok so, I doubt what you mean can actually work, but I'm open to be proven wrong and always like to learn new stuff, maybe you have some genius idea. So let's just look at one example where the current esync/fsync patches can't handle properly: PulseEvent.
Basically what it does is, it wakes up any threads waiting for the event, without changing the event's state. Sound simple, right? (please don't start about it being a "badly designed API". I'm fully aware about it, and by NO means do I advocate using it! unfortunately, existing applications and games DO use it, that's why it must be implemented)
One way you can badly emulate it is like setting the event (which wakes up a waiting thread) followed up by resetting the event. Obviously, this is a race condition, because you have TWO syscalls here, and there's no guarantee that your "reset event" state happens before any other thread uses the event, even though it should. In fact, your thread could get completely halted right after the first syscall for ages.
So, you can protect it with a lock, right?
Code:PulseEvent() { acquire_lock(); // <-- spinlock to avoid another syscall set_event_and_wake_thread(); reset_event(); release_lock(); }
If the whole thing was just ONE syscall (something that fully implements PulseEvent), none of this would be an issue. No thread would ever wait, there would be at most one syscall, etc.
So I fail to understand what you mean. Can you show a simple pseudo-code example for PulseEvent, just so I can see what you're trying to say better?
Again, I understand you can do almost any (or even any) synchronization with just futexes or spinlocks, but that's only when you control the design of the application yourself, this isn't such a case.
So maybe PulseEvent poses special problems that are hidden from plain view. For example, why do you think there need to be 2 syscalls unless the whole thing is a single syscall?
In the absence of knowing about any Windows specifics, I would think of something like:
Code:{ acquire_lock_of_event(e); thread t = thread_waiting_for_event(e); clear_waitlist_of_event(e); reset_event(e); release_lock_of_event(e); if (t != NULL) { wake_thread( t ); } }
(EDIT: This is of course a simplification. For example instead of the "thread t", it would be something like "wait_entry_for_thread".)
(EDIT 2: Note that the potential syscall in wake_thread is outside the lock.)Last edited by indepe; 21 January 2021, 06:43 PM.
Comment
-
Originally posted by oiaohm View PostNo that would not work. Because you have pre atomic methods and other code in modern day windows applications that are atomic methods. Anyone porting code to DEC Alpha back in Windows 4.0 days use to run into that problem as well where you code areas that were using atomic methods that was fine on i386 x86 or better was now totally screwed.
Pre atomic methods for doing locking can be used on a CPU that support atomic methods but atomic methods cannot be used to replace them in all cases.
Comment
-
Originally posted by indepe View PostSo maybe PulseEvent poses special problems that are hidden from plain view. For example, why do you think there need to be 2 syscalls unless the whole thing is a single syscall?
In the absence of knowing about any Windows specifics, I would think of something like:
Code:{ acquire_lock_of_event(e); thread t = thread_waiting_for_event(e); clear_waitlist_of_event(e); reset_event(e); release_lock_of_event(e); if (t != NULL) { wake_thread( t ); } }
(EDIT: This is of course a simplification. For example instead of the "thread t", it would be something like "wait_entry_for_thread".)
(EDIT 2: Note that the potential syscall in wake_thread is outside the lock.)
Tbh, it sounds pretty nice in practice, but I'm guessing this approach suffers from contention or some other thing they measured. I don't know—they seem to be specifically looking for "kernel options". BTW esync currently emulates some of these things pretty badly, not just in terms of performance, but having race conditions. That's why some weird games don't even work with esync on.
Anyway you gave me some ideas to try for my "lockless" (as in, syscalls, not atomic locks) design of an app I have.
Comment
-
Originally posted by Weasel View PostOk, I understand what you're trying to say now. You want to implement the whole thing in userspace, just with locks to protect the code from races.
Originally posted by Weasel View PostTbh, it sounds pretty nice in practice, but I'm guessing this approach suffers from contention or some other thing they measured. I don't know—they seem to be specifically looking for "kernel options". BTW esync currently emulates some of these things pretty badly, not just in terms of performance, but having race conditions. That's why some weird games don't even work with esync on.
Originally posted by Weasel View PostAnyway you gave me some ideas to try for my "lockless" (as in, syscalls, not atomic locks) design of an app I have.
Comment
Comment