Originally posted by Weasel
View Post
Announcement
Collapse
No announcement yet.
Wine Developers Are Working On A New Linux Kernel Sync API To Succeed ESYNC/FSYNC
Collapse
X
-
-
Originally posted by indepe View PostWhy would it be a problem if things need to be atomic? I don't know all Windows specifics that you might be thinking of, but waiting for multiple objects in general does not require special non-existing syscalls. In previous proposals, only a "FUTEX_WAIT_MULTIPLE" syscall was mentioned as needed. This functionality, however, can definitely be implemented on top the existing FUTEX facilities.
This isn't just about simple futexes—those can work with "critical sections" on Windows anyway. Sometimes you need an operation to be atomic to avoid race conditions. NtPulseEvent is an example: it wakes threads waiting for the event without (re)setting the event. esync emulates this, and has race conditions. esync is known to break some games for this reason, it does it in the name of performance, but it's not perfect.
Here's a list from the mailing list:* A blocking operation, like poll() that optionally consumes things, or like read() on a vectored set of file descriptors. This doesn't necessarily mean we have to replicate the manual/auto distinction in the kernel; we can handle that in user space. This by itself doesn't actually seem all that unreasonable, but...
* A blocking operation like the above, but corresponding to "wait-all"; i.e. which atomically reads from all descriptors. Just from skimming the code surrounding things like read() and poll(), this seems very ugly to implement.
* A way to atomically write() to an eventfd and retrieve its current count [for semaphore and event operations].
* A way to signal an eventfd, such that waiters are woken, but without changing its current count [i.e. an operation corresponding to NtPulseEvent].
* A way to read the current count of an eventfd without changing it [for NtQuerySemaphore and NtQueryMutant; for NtQueryEvent we can use poll.]
Comment
-
Originally posted by indepe View PostI didn't respond to this part yet. Again, unless there are some surprising Windows specifics that need to be emulated on top of the main functionality, things should generally not require many syscalls. Functions such as SET_EVENT should require a syscall only if a thread(s) needs a WAKE call, or if there is contention during the execution of the function. And functions such as WAIT_ANY/WAIT_ALL should require a syscall only if they actually need to wait (or if there is contention). Otherwise such functions should often execute without the need for any syscall.
Comment
-
Originally posted by Weasel View PostYeah, that's fsync, not esync. esync can't do it, because they are event fds, not futexes.
This isn't just about simple futexes—those can work with "critical sections" on Windows anyway. Sometimes you need an operation to be atomic to avoid race conditions. NtPulseEvent is an example: it wakes threads waiting for the event without (re)setting the event. esync emulates this, and has race conditions. esync is known to break some games for this reason, it does it in the name of performance, but it's not perfect.
Here's a list from the mailing list:And before you say, no, writing and THEN reading is not atomic, it's inherent to race conditions, AND it's two syscalls!
Originally posted by Weasel View PostYeah, contention is the performance problem here, obviously.
EDIT: And this is nothing in comparison to needing a syscall even when there is no contention.Last edited by indepe; 20 January 2021, 08:13 PM.
Comment
-
Originally posted by indepe View PostFirst of all, all locking on x86 is atomic, be it inside any kernel or in user space. And I have no idea what else it could be on VMS. What kind of CPU instructions would it use?
https://devblogs.microsoft.com/oldne...17-00/?p=96835
That DEC Alpha is a truly does lack the instructions to safely perform atomic operations yes those MIPS cpu of that time frame is just as bad. So your early Windows NT has a lot of pre atomic locking methods using the kernel as the lock master.
Your problem here you are thinking modern locking methods indepe problem here is parts of Windows NT design that is still in windows 10 are pre atomic instruction methods.
The pre atomic locking methods work on a CPU supporting atomic just not at ideal efficiency but the atomic methods on a cpu that does not support atomic is problem child.
It really simple to miss the way windows NT design at core is not 100% modern locking because windows NT 3.1 supported platforms that did not support modern locking methods. Horrible part here is atomic methods cannot be used to emulate all these pre atomic methods without massive overhead. This is just the way it is.
Comment
-
Originally posted by oiaohm View Post
This is your first major mistake. All locking on x86 is not atomic. i286 does not have the machine code/assemble in the cpu to in fact do atomic locking. Windows NT 3.1 targeted DEC Alpha and MIPS (R4000 and R4400) cpu as well as x86.
https://devblogs.microsoft.com/oldne...17-00/?p=96835
That DEC Alpha is a truly does lack the instructions to safely perform atomic operations yes those MIPS cpu of that time frame is just as bad. So your early Windows NT has a lot of pre atomic locking methods using the kernel as the lock master.
Your problem here you are thinking modern locking methods indepe problem here is parts of Windows NT design that is still in windows 10 are pre atomic instruction methods.
The pre atomic locking methods work on a CPU supporting atomic just not at ideal efficiency but the atomic methods on a cpu that does not support atomic is problem child.
It really simple to miss the way windows NT design at core is not 100% modern locking because windows NT 3.1 supported platforms that did not support modern locking methods. Horrible part here is atomic methods cannot be used to emulate all these pre atomic methods without massive overhead. This is just the way it is.
The 286 always asserts lock during an XCHG with memory operands.
Comment
-
Originally posted by indepe View Post
I'm not going to spend a lot of time on this. A quick search found this:
This means that XCHG is an atomic instruction.
Comment
-
Originally posted by oiaohm View Post
No its not atomic on the 286. Multi threads on a 286 using XCHG like that could explode in your face. The out of order protection is added in 386+ read the page you quoted a little closer there is a 386+ note there for a reason.
Comment
-
Originally posted by indepe View PostThose are problems with using eventfd, which doesn't have anything to do with what I am talking about: using futex.
Originally posted by indepe View PostYou are using my own words without any indication that you understand them. In this case, even contention will usually just be a few spins and not require a syscall, and would unlikely be any better if it were done inside the kernel. (However you need to be careful to limit spinning to an upper limit when outside the kernel. Which is a detail that probably doesn't make much sense to you, and isn't very relevant to this discussion.)
EDIT: And this is nothing in comparison to needing a syscall even when there is no contention.
Some games work well with esync or fsync, but not every game. Let's say 99% of games work well. That's your "usually". But we don't care about those games in this thread; this isn't about them. We care about the other 1%. The games that use too many threads and have too much contention don't. And that's the whole point of this thing. Going from 100 fps to 80 fps for example.
We're not talking about the "average" here. We're talking about this specific situation where certain games suffer from it, due to whatever design they have. Call it wrong, call it buggy, I don't care. The games are built that way and there's nothing you or Wine can do about it, other than emulate Windows better (since they run better on Windows), which is one atomic syscall for these operations.
Comment
Comment