Announcement

**indepe** · 07 June 2021, 09:57 AM

Originally posted by sandy8925 View Post

Shared memory across processes requires polling though, which is why futexes exist.

My understanding is that both atomic operations and futexes can be used on shared memory. So that it is still true, across processes, that you need futex syscalls only if the thread needs to block/suspend or wake/resume, and otherwise just atomic operations in user space are sufficient. Using futexes across processes requires shared memory.

futex(2) - Linux manual page

https://man7.org/linux/man-pages/man2/futex.2.html

When using futexes, the majority of the synchronization operations are performed in user space. A user- space program employs the futex() system call only when it is likely that the program has to block for a longer time until the condition becomes true.

In order to share a futex between processes, the futex is placed in a region of shared memory, created using (for example) mmap(2) or shmat(2). (Thus, the futex word may have different virtual addresses in different processes, but these addresses all refer to the same location in physical memory.) In a multithreaded program, it is sufficient to place the futex word in a global variable shared by all threads.

So no polling in either case (single process or multi process), and in both cases syscalls only if the thread needs to block or wake, and otherwise the logic is handled by atomic operations only. Therefore futexes are not an API that would be directly used by application code, they are used by user space library code.

**oiaohm** · 07 June 2021, 04:16 PM

Originally posted by indepe View Post

That's how I understood the 42,000 number in the first place. I just didn't know where your number came from.

That the 42000 number is from the developers of the patch for improved futex2 they work with Valve on games they know how bad the beast is.

Originally posted by indepe View Post

I took a moment to look at Table 5.7 and 5.8, not really enough time to be certain, but it seems to me that the use of WaitForMultipleObjects is exteremely inefficient in this case:

The mainThread waits for a number of workerThreads to finish. This is done by creating an array of events, one for each workerThread, and the mainThread waits until SetEvent is called on all of them, by calling WaitForMultipleObjects on the whole array.

It seems this could be done far more effciently by having an atomic integer counter for the remaining workerThreads, and a single semaphore that the main thread waits on.
Let's say there are 8 worker threads. Then the counter is initialized with 8. Each worker thread that finishes the processing for that frame will atomically decrement that counter (there is a single specific atomic CPU instruction for that). The workerThread that decrements it to zero will then signal the semaphore that the mainThread is waiting for.

This complete idea does not really work. First thing is why its multiple objects that is chosen for particular reason. Think case of GPU compute unit reset. Do note when GPU resets the complete device may not be reset segments at a time can be. So objects that have already decreased their counter may need to increase it again.

Next the number of object locks is not limited to your cpu core count. Yes that wait for mutiple can be waiting on confirmation for gpu processing units. That atomically decrement of a counter across cpu/gpu setups does not exactly work well. So when you say workerThread in game engine with dx objects you are not talking only about items running on the CPU a worker thread could be a shader on a GPU.

Wait for multiple is not a simple problem you are attempt to think of it as only a cpu bound problem. Game engines are evil messes.

Also think you have a gpu or cpu glitch in one of the workerthreads now you have to debug it with you simplified counter you have lost the lock tracking on that section of code have you not.

"exteremely inefficient" game engines are not designed 100 percent for efficiency. GPUs are not always the most friendly item. Yes on the fly resetting that GPU do has the effect that a object passed to a GPU may flag as complete and then change back to incomplete. A game engine at times has more than one waitformultipleobjects per frame. This can be if X number of objects are complete the generated output so far can be now place on the output queue then when Y number of objects complete if this is before frame change the output queue gets updates. That intel example is really simple with only 1 waitformultipleobjects but for something that has dynamic quality level in the game engine will have more per frame. Yes these will overlapping sets as you would call them in maths.

I will keep this as simple as possible. Lets say we have objects 1 to 10. A dynamic game engine would likely set 1 waitformultipleobjects for all 10 objects if this triggers before frame its rendered if it triggered late everything is cleaned up. But when the first 1 to 5 are rendered this might trigger a set the current generated output as the next frame. Then you could also have 1-4 and 6 be different trigger to update output. Waitformultipleobjects is basically allowing math set theory to be applied to locks.

Please be aware this can get really horrible really quickly there can be like 1000+ object complete conditions that its valid to update the output frame.

Game engines are horrible they are not 100 percent for efficiency they are to make sure a frame of some quality makes it to the user while also at times dynamically dropping quality to maintain performance. Coding a game engine to be able to dynamically adjust quality does not end up with the most efficient code path. This is like dynamically adjusting how much anti-cheat runs.

**indepe** · 07 June 2021, 05:25 PM

Originally posted by oiaohm View Post

This complete idea does not really work. First thing is why its multiple objects that is chosen for particular reason. Think case of GPU compute unit reset. Do note when GPU resets the complete device may not be reset segments at a time can be. So objects that have already decreased their counter may need to increase it again.

Although I don't see a problem with increasing a counter, I don't think that applies here. I can only go by the example given: when you distribute a work load onto multiple workerThreads, at some point they need to decide if they are finished, and then they are finished. They don't hang around afterwards to change their mind about it. They are simply done.

I'm not saying there are no reasons to use WaitForMultipleObjects in a game engine, just that I don't see a reason in the example code given here (table 5.7 and 5.8), and your elaborate handwaving doesn't change that.

Originally posted by oiaohm View Post

Next the number of object locks is not limited to your cpu core count.

Of course not. That's obvious.

**oiaohm** · 07 June 2021, 05:51 PM

Originally posted by indepe View Post

Although I don't see a problem with increasing a counter, I don't think that applies here. I can only go by the example given: when you distribute a work load onto multiple workerThreads, at some point they need to decide if they are finished, and then they are finished. They don't hang around afterwards to change their mind about it. They are simply done.

The example does not say they cannot hang around and change mind. SetEvent has a mirror ResetEvent. Yes it would be possible to add to 5.8 in a rendering

// More rendering subtasks
Section can see stuff added.
a option to reverse one of the SetEvents on a Object with a ResetEvent. This can happen because the game attempted to out of order render something and the player has pressed a input that means the guess of the input is no correct so the output of that bit needs to be redone.

Originally posted by indepe View Post

at some point they need to decide if they are finished, and then they are finished

Inside game engines this is not something you can 100 percent presume. Yes the worker will say for this lock at this stage with the information I currently have this is finished and little while latter something changes like input or AI game logic.... and it will change the lock back to that work is not complete.

Originally posted by indepe View Post

I'm not saying there are no reasons to use WaitForMultipleObjects in a game engine, just that I don't see a reason in the example code given here (table 5.7 and 5.8), and your elaborate handwaving doesn't change that.

That is the basic hello world example. As the game engine gets more complete it ceases to be straight line time problem. Yes reset event stuff does get added in 5,8 as game engine developers and 5.7 gains to a cease when frame display time comes up.

indepe think of the super warped things you see with some games where there has been network lag and the game receives update on player information and you see a stack of rapid rendered frames because its tripped resetevent causing frames that were though to be complete to be re-rendered and displayed.

Originally posted by indepe View Post

They don't hang around afterwards to change their mind about it.

Sorry reality we are talking game engines. Different parts of the game rendering logic does hang around and does change its mind about it. This does trigger off player jaring cascading rendering effects at times of the developers has not been careful to make sure that it cannot go back multi frames into the past.

**indepe** · 07 June 2021, 06:10 PM

Originally posted by oiaohm View Post

Sorry reality we are talking game engines. Different parts of the game rendering logic does hang around and does change its mind about it. This does trigger off player jaring cascading rendering effects at times of the developers has not been careful to make sure that it cannot go back multi frames into the past.

When you call WaitForMultipleObjects, you call it with a specific array of specific events, and once they are all set, the call returns.

You keep trying to create the impression that it has some kind of magic that is beyond the understanding of a mortal. It does not.

**oiaohm** · 07 June 2021, 06:55 PM

Originally posted by indepe View Post

When you call WaitForMultipleObjects, you call it with a specific array of specific events, and once they are all set, the call returns.

You keep trying to create the impression that it has some kind of magic that is beyond the understanding of a mortal. It does not.

You one of the basic usage case.
https://docs.microsoft.com/en-us/win...ultipleobjects

Depending on how WaitForMultipleObjects used it may not be waiting for all the specified events to be set or for one to be set. In the one case if many are set at once the first closed to zero in the list of object handles gets claimed. Yes it also has a time out.

With a input processed this can result in a event reset so unlocking something and the WaitForMultipleObjects restarted for a frame. See why a basic counter does not work. One of the tasks that have been counter off single may need to be rerun due to some event that has happened.

You could have a few 100 times WaitForMultipeObjects created for a single frame render due to events trigger some bit of rendering needing to be redone.

Game engine is not a straight line problem. Trying to get latancy as low a possible results in prerendering stuff. Prerendering stuff results in needing to reset a event when a something has been guessed wrong.

200FPS does not equal rendering 200 frames per second this can be rendering like 4000+ frames.second with minor corrections. Think about it for areas that have not changed between frame renders do you want to waste gpu/cpu time rendering those again.

The simple example does not have all the game engine reset flows in there. Like player moved weapon it should be displayed that they did next frame right and you have rendered the frame up with the weapon not moved because that input came in after the frame was rendered. So you want to reset the weapon render event this unlocks it on the workerthread to render again and you go back to the same WaitForMultipleObjects code that queued the first version of the frame to queue the second version of the frame and so on.

The list of events feed into WaitForMultipleObjects is basically the game engine checklist. The means to uncheck something done status is useful to a game engine. The individual object events allows you to unlock each one individually that is good for reworking a frame of output while attempt to keep the CPU/GPU cost of that low.

Of course the render weapon event that goes to the frame output render may have another wait for multi under based on what kind of action. So it was unlocked because the weapon is now fired for example.

Lot of the windows game engines basically have a tree with many branches of locking. So WaitForMultipleObject on top of other WaitForMultipleObject. Yes a game engine you would expect to see WaitForMultipleObjects in the workers as well.

The usage objective is to be able to redo a output frame with corrections instead of having to redo the complete thing for fast events.

**indepe** · 07 June 2021, 07:15 PM

Originally posted by oiaohm View Post

Depending on how WaitForMultipleObjects used it may not be waiting for all the specified events to be set or for one to be set. In the one case if many are set at once the first closed to zero in the list of object handles gets claimed. Yes it also has a time out..

Obviously. You can tell that by looking at the parameter list already.

**oiaohm** · 08 June 2021, 06:07 AM

Originally posted by indepe View Post

Obviously. You can tell that by looking at the parameter list already.

One of the set is used for input handling and other things under windows design. Basically the input event loop.

Yes each setup of WaitForMultipleObjects only happens once but. The WaitForMultipleObjects code path may be run many times and with game engines it is.

You have like 10 frames+ in fly. I gave weapon as example this could be render of the HUD.... So each frame object list is not a executive set of objects there is shared objects between frames.

This is why a normal counter is not that simple to setup. Game engine could have like 100+ frames is different states of render competition so 100+ active WaitForMultipleObjects on each of those frames in fly. Some of these frames are attempted out of order rendering.

The tricky part is really how to update all the WaitForMulipleObjects status like when you have 100+ of them at once sharing a common object. Yes 100 lines of atomically update all the individual atomic integers is not going to be tidy code to read.

**indepe** · 08 June 2021, 09:38 AM

Originally posted by oiaohm View Post

One of the set is used for input handling and other things under windows design. Basically the input event loop.

You are starting to mix different things and use cases. In an event loop, you use wait-for-any, not wait-for-all. Also you might not want to be limited to 64 event sources.

Originally posted by oiaohm View Post

Yes each setup of WaitForMultipleObjects only happens once but. The WaitForMultipleObjects code path may be run many times and with game engines it is.

You have like 10 frames+ in fly. I gave weapon as example this could be render of the HUD.... So each frame object list is not a executive set of objects there is shared objects between frames.

This is why a normal counter is not that simple to setup. Game engine could have like 100+ frames is different states of render competition so 100+ active WaitForMultipleObjects on each of those frames in fly. Some of these frames are attempted out of order rendering.

The tricky part is really how to update all the WaitForMulipleObjects status like when you have 100+ of them at once sharing a common object.

Exactly, the situation you try to describe here is much too complex for a single WaitForMultipleObjects call.

Whereas the situation described before is a simpler, very common situation of distributing a workload onto a fixed number of workerThreads that simply finish their task at some point. I'd very much expect these situations to appear in game engines as well. It certainly did in my own rendering, where there is no such try-and-retry logic going on, for which a single WaitForMulipleObjects call isn't suited either. The example code in those links would not be able to handle that correctly.

Originally posted by oiaohm View Post

Yes 100 lines of atomically update all the individual atomic integers is not going to be tidy code to read.

You would have a wrapper function, not write that code explicitely in each situation.

Also, in the kind of situation that you describe now, you might want to use bitsets instead of a counter. or even a more complex data structure like a graph.

However I expect that a generic implementation of WaitForMultipleObjects's wait-for-all (which by the way is limited to 64 objects) would use a counter internally, just that it would also increment, not only decrement. But in the most general case be more complex than just that, which is why use case specific simplifications/optimizations will often be better.

**Weasel** · 09 June 2021, 07:04 AM

Originally posted by oiaohm View Post

The tricky part is really how to update all the WaitForMulipleObjects status like when you have 100+ of them at once sharing a common object. Yes 100 lines of atomically update all the individual atomic integers is not going to be tidy code to read.

https://en.wikipedia.org/wiki/Procedural_programming

Announcement

FUTEX2 Linux Patches Updated To Support Variable-Sized Futexes

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment