Originally posted by duby229
View Post
Announcement
Collapse
No announcement yet.
Futex2 Proposed In Latest Effort For Linux Kernel Optimization That Can Benefit Gamers
Collapse
X
-
-
Originally posted by indepe View Post
No, user space is more performant when there is no contention, or only short time contention. Which is why glibc implements those parts of lock code in user space.
Originally posted by indepe View Post"Futex" literally is short of "fast userspace mutex". Futexes have the purpose of allowing much of the locking code to be implemented in user space, because that is faster. That's the point. As I explained above and in previous discussions, that proposal is missing a a good public reasoning to be implemented in the kernel.
- Likes 2
Comment
-
Originally posted by Emmanuel Deloget View PostWhile this is true for a simple mutex, this is no longer true when waiting on multiple mutexes,
Originally posted by Emmanuel Deloget View Postand providing a fast and correct implementation in user space (either a WAIT-ALL ou WAIT-ONE-OF) is going to be at least very challenging (for the record, you're rally welcome if you do; I tried, I failed, because user space synchronisation on multiple mutexes is really hard to get right in the first place).
Originally posted by Emmanuel Deloget View PostThe locking takes place in user space ; the waiting (i.e. the synchronisation, i.e. the hard part) takes place in the kernel. Locking the underlying objet can be as simple as an atomic store ; checking the lock before calling futex can be as simple as an atomic read ; but spinning on the lock when there is contention is simply not something you want to do while in userspace, and I speak from experience in that very specific case.
Comment
-
Originally posted by DadSchoorse View PostWell, there totally are some usecases for spinlocks, expasially when you almost never actually try to aquire a locked spinlock. The problem with standard mutex is that it's pretty slow to lock. Both dxvk and proton's vkd3d use spinlock, vkd3d got massive cpu limited performance gains by using them (https://github.com/HansKristian-Work/vkd3d/pull/57). Just because you (or Linus fwiw) don't see use cases does not mean there aren't some.
You might want to read this...
https://docs.microsoft.com/en-us/pre...ectedfrom=MSDN
Locking in userspace is a HUGE problem with modern multicore CPU's. All you are doing is introducing massive amounts of context switching. And that's literally it. Nothing else. In user space on linux you aren't a kernel level process scheduler so all you -can- do is introduce massive amounts of context switching. (The actual process scheduler in the kernel will almost definitely work against your locking, resulting in huge amounts of context switching and that's the problem. If you really need to do locking then use futex)Last edited by duby229; 13 June 2020, 03:44 PM.
Comment
-
Originally posted by DadSchoorse View PostWell, there totally are some usecases for spinlocks, expasially when you almost never actually try to aquire a locked spinlock. The problem with standard mutex is that it's pretty slow to lock. Both dxvk and proton's vkd3d use spinlock, vkd3d got massive cpu limited performance gains by using them (https://github.com/HansKristian-Work/vkd3d/pull/57). Just because you (or Linus fwiw) don't see use cases does not mean there aren't some.
The spinlock as implemented in your link is simplistic and lacks a back off algorithm (and a pause call in general). Even so, I think a performance optimized adaptive mutex (userspace oriented) is preferable for an application even in cases where a spinlock is tempting. However, I have not benchmarked PTHREAD_MUTEX_ADAPTIVE_NP to test if the performance of its specific implementation fulfills that expectation.
- Likes 1
Comment
-
Originally posted by DadSchoorse View PostWell, there totally are some usecases for spinlocks, expasially when you almost never actually try to aquire a locked spinlock. The problem with standard mutex is that it's pretty slow to lock. Both dxvk and proton's vkd3d use spinlock, vkd3d got massive cpu limited performance gains by using them (https://github.com/HansKristian-Work/vkd3d/pull/57). Just because you (or Linus fwiw) don't see use cases does not mean there aren't some.
Comment
-
Originally posted by DadSchoorse View PostWell, there totally are some usecases for spinlocks, expasially when you almost never actually try to aquire a locked spinlock. The problem with standard mutex is that it's pretty slow to lock. Both dxvk and proton's vkd3d use spinlock, vkd3d got massive cpu limited performance gains by using them (https://github.com/HansKristian-Work/vkd3d/pull/57). Just because you (or Linus fwiw) don't see use cases does not mean there aren't some.
That said there are of course still some ns overhead with a custom spinlock and a library function like pthread_mutex uses and if the locks are used many many times per second then it all adds up of course, which is also why e.g Linus never have claimed that a spinlock is never to be used in user-space. He just meant that for 99% of the cases where you think that a spin-lock is the correct answer, it in practice isn't.
edit: Just to put some real numbers in here I just tested pthread_mutex vs pthread_spin_lock where I did lock+unlock in a loop 100M times and after 10 rounds each the number of clock cycles on my CPU was 1481M for mutex and 1844M for splinlock (with PTHREAD_PROCESS_PRIVATE set) so for complete uncontented cases mutex are not only just as fast as a spinlock, it can actually be somewhat faster.Last edited by F.Ultra; 14 June 2020, 12:05 PM.
- Likes 1
Comment
-
Originally posted by indepe View PostNowadays many locks are a combination of spinlock and kernel lock. They first spin in user space for a certain time, before they resign to do a kernel wait.
They combine advantages of each, at the cost of an additional atomic read-modify-write instruction in the unlock path. That's just a few nanoseconds which bugs those who seek an ideal lock, but avoids the problems of pure spinlocks. In practice these few nanoseconds are usually of no consequence.
In a similar way, the futex2 logic for wait-for-one-of-many should instead be implemented in user-space, where it will usually avoid the need for a kernel call in the absence of contention. Once that is done, there is no need to duplicate the required lists and logic in the kernel.
Instead, it may be useful to have an awake-one-of-many kernel API, to allow the kernel scheduler to pick the best candidate for waking one of many. However, that is not necessary, just optional.
So, as in a previous discussion, I don't see a need or advantage for the "futex2" concept inside the kernel. If that exists, it is non-obvious and should be stated explicitly. But the proposal doesn't seem to state such a need or advantage, as far as I know.
Comment
-
Originally posted by Space Heater View Post
Is there really a significant performance difference between a custom spinlock implementation like the above and pthread_spin_lock/unlock?
edit: looked up the actual glibc-code and it's actually quite clever here:
Code:int pthread_spin_lock (pthread_spinlock_t *lock) { int val = 0; if (__glibc_likely (atomic_exchange_acquire (lock, 1) == 0)) return 0; do { do { atomic_spin_nop (); val = atomic_load_relaxed (lock); } while (val != 0); } while (!atomic_compare_exchange_weak_acquire (lock, &val, 1)); return 0; }
Last edited by F.Ultra; 14 June 2020, 12:35 PM.
- Likes 1
Comment
-
Originally posted by F.Ultra View PostMy take on this patch is that they are only going to use it when there is contention just like how you use the regular FUTEX today. And there is no way that you can write an equivalent function in 100% userspace that handles contention that will be able to match what the kernel can do here (like e.g suspending threads) for that the Windows applications use the WaitForMultibleObjects() call.Last edited by indepe; 14 June 2020, 02:18 PM.
- Likes 1
Comment
Comment