Announcement

**indepe** · 09 February 2021, 04:40 PM

Originally posted by F.Ultra View Post

It brings the possibility to wait for several futexes at the same time using a single syscall. Not sure how many applications that will make use of it, but I can see several use cases personally so others might see the same.

There are many ways in which threads can wait for multiple things to happen, it doesn't require any syscall at all. Unless the thread has to actually wait, in which case the existing futex syscall (called directly or indirectly) does just fine. Why would you need to wait specifically for multiple futexes, as opposed to multiple events or multiple notifications?

**F.Ultra** · 09 February 2021, 04:59 PM

Originally posted by indepe View Post

There are many ways in which threads can wait for multiple things to happen, it doesn't require any syscall at all. Unless the thread has to actually wait, in which case the existing futex syscall (called directly or indirectly) does just fine. Why would you need to wait specifically for multiple futexes, as opposed to multiple events or multiple notifications?

Lets say that you in one thread have to process data supplied by say 4 different queues and these queues happen to use futexes as their signal when the queue is empty or not, using a waitv() method then you have a single context switch in the contended case and the scheduler in the kernel can put my thread in wait state until any other thread or process calls futex() with FUTEX_WAKE on one of those 4 futexes that my thread is waiting for.

Current solution requires you to do busy waiting on all 4 futexes with a deliberately low timeout wasting huge amounts of context switches and cpu time.

**indepe** · 09 February 2021, 05:13 PM

Originally posted by F.Ultra View Post

Lets say that you in one thread have to process data supplied by say 4 different queues and these queues happen to use futexes as their signal when the queue is empty or not, using a waitv() method then you have a single context switch in the contended case and the scheduler in the kernel can put my thread in wait state until any other thread or process calls futex() with FUTEX_WAKE on one of those 4 futexes that my thread is waiting for.

Current solution requires you to do busy waiting on all 4 futexes with a deliberately low timeout wasting huge amounts of context switches and cpu time.

Are you saying the queues make a FUTEX_WAKE call each time the state of the queue changes to non-empty? This would seem extremely inefficient and unnecessary to me.

**Alex/AT** · 10 February 2021, 03:31 AM

Originally posted by indepe View Post

Are you saying the queues make a FUTEX_WAKE call each time the state of the queue changes to non-empty? This would seem extremely inefficient and unnecessary to me.

Alas, that's how majority of Win-based code is written.

**indepe** · 10 February 2021, 06:52 AM

Originally posted by Alex/AT View Post

Alas, that's how majority of Win-based code is written.

No.

Windows applications that use Events are using them as high-level, application-level constructs, along with mutexes and semaphores. However, Linux's futexes are low-level methods used by library writers to implement synchronization primitives that are mostly userspace based.

The futex2 proposal seems to give the wrong idea that a new application level feature is being created (wait for many), and that is not the case. They are not application level features, they are used to implement application level features. And that can be done with the existing futexes. If there is a real reason to add wait-for-many at the futex level, then we haven't heard of it yet, in the material that I have seen here on phoronix and elsewhere.

**indepe** · 10 February 2021, 08:16 AM

In other words, under the hood the Windows implementation could theoretically be more like a userspace implementation using the existing futex API, just as a low level API within the kernel. Theoretically.

**F.Ultra** · 10 February 2021, 01:17 PM

Originally posted by indepe View Post

Are you saying the queues make a FUTEX_WAKE call each time the state of the queue changes to non-empty? This would seem extremely inefficient and unnecessary to me.

No, they only make the syscall if the futex is set to 1 indicating that atleast another thread/process is in idle-state.

**indepe** · 10 February 2021, 05:22 PM

Originally posted by F.Ultra View Post

No, they only make the syscall if the futex is set to 1 indicating that atleast another thread/process is in idle-state.

If so, then they already need an interchange between waiter and sender in userspace. At which point they could also, instead of burdening the kernel with that, add the waiter to a list (which usually would be empty or just a small number of items, so not require dynamic memory). The list would include a pointer to a futex for each active wait call (or waiter). If the item is consumable (auto-reset), it would wake one of them, otherwise all eligible ones if any.

Done. No kernel patch necessary, you could run it on any kernel you want, yesterday.

No reason to keep the corresponding lists, which possibly require dynamic memory, in the kernel, as kernel engineers indicated is to be avoided. Why would that be necessary? Because if each thread waits for at most one futex, the thread itself can be the list element. Otherwise the kernel needs to allocate and deallocate dynamic memory, or keep larger memory just in case, even for threads not using this feature. And easy to avoid keeping kernel resources locked from other threads not using this feature.

Another advantage is that you can have permanent subscriptions, instead of adding and removing to the list with each wait call. Then the receiver can remember the event signaling even if it comes before the wait call and the event is reset meanwhile. Which is a very important use case in my experience. Actually, the most important one. Also, then the thread can do other things while the signal is not there yet.

**F.Ultra** · 10 February 2021, 06:30 PM

Originally posted by indepe View Post

If so, then they already need an interchange between waiter and sender in userspace. At which point they could also, instead of burdening the kernel with that, add the waiter to a list (which usually would be empty or just a small number of items, so not require dynamic memory). The list would include a pointer to a futex for each active wait call (or waiter). If the item is consumable (auto-reset), it would wake one of them, otherwise all eligible ones if any.

Done. No kernel patch necessary, you could run it on any kernel you want, yesterday.

No reason to keep the corresponding lists, which possibly require dynamic memory, in the kernel, as kernel engineers indicated is to be avoided. Why would that be necessary? Because if each thread waits for at most one futex, the thread itself can be the list element. Otherwise the kernel needs to allocate and deallocate dynamic memory, or keep larger memory just in case, even for threads not using this feature. And easy to avoid keeping kernel resources locked from other threads not using this feature.

Another advantage is that you can have permanent subscriptions, instead of adding and removing to the list with each wait call. Then the receiver can remember the event signaling even if it comes before the wait call and the event is reset meanwhile. Which is a very important use case in my experience. Actually, the most important one. Also, then the thread can do other things while the signal is not there yet.

The only needed interchange is the futex itself, since it's a shared int32_t anyway. Your solution with a user space futex broker is exactly the road that the WINE devs have tried to go so far, and futex2 is them saying that it's not a viable solution. Perhaps they are not up to the task of writing this very simple broker, but so far no one else have provided a better implementation either.

**indepe** · 10 February 2021, 07:19 PM

Originally posted by F.Ultra View Post

The only needed interchange is the futex itself, since it's a shared int32_t anyway. Your solution with a user space futex broker is exactly the road that the WINE devs have tried to go so far, and futex2 is them saying that it's not a viable solution. Perhaps they are not up to the task of writing this very simple broker, but so far no one else have provided a better implementation either.

Right, it's shared memory anyway. And according to your logic, shared in userspace.

Tried where, when and how? Except for "fsync", I only know of the attempt to use eventfd, which is something completely different. I believe I have read everything pointed out in previous phoronix articles, and some sources of "esync", and I don't know of any extensive attempt based on userspace and existing futex API. As far as I know, "fsync" is trying to use the kernel patches called "futex2". That's also how it appeared in the email proposal we discussed a few days ago: there, managing-a-waitlist appeared to be something that wasn't even seriously considered. Although that's perhaps a separate effort.

Why would it would it not be a viable solution? And how would it get more viable by doing it inside the kernel? I have implemented similar things in userspace, and they work just fine. Except for the wait-for-all-signals-at-the-same-time option, but with more functionality in other regards. And also another function with wait-for-all, but not necessarily at-the-same-time. So that's a very weird claim to me. Even doesn't require locking during normal operation, which in my case is the permanent subscription use case. To me, that's like denying the obvious. I'm like "what the heck is their problem, and why don't they say what it is?". In another discussion I heard a reason is that issuing multiple wake calls would be a performance problem. That can't be a big problem if the wake calls are only done for waiters actually waiting (which is an expensive operation anyway), and as I have written previously, that could be addressed with a simple and general purpose wake-multiple syscall which would be trivial to implement since it would basically just be a loop over existing functionality. In other words, I have a hard time thinking of anyone as knowing what they are talking about if/when they make such claims.

Announcement

The FUTEX2 System Call Continues Working Its Way Towards Mainline In 2021

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment