Announcement

Collapse
No announcement yet.

FUTEX2 Linux Patches Updated To Support Variable-Sized Futexes

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by MadCatX View Post
    Their implementation builds and destroys the list of handles that are being waited on during each call to WaitForMultipleObjects. That is not very efficient since each malloc could possibly be a syscall.
    Well that is required due to the horrible nature of the WaitForMultipleObjects API and is needed to be done regardless of how you solve it (for usage in WINE that is), and as we have seen from the esync problems, lots of Windows apps/games are written in a way that if you don't handle it this way you end up with exhausted resources.

    Comment


    • #42
      Originally posted by F.Ultra View Post

      Well that is required due to the horrible nature of the WaitForMultipleObjects API and is needed to be done regardless of how you solve it (for usage in WINE that is), and as we have seen from the esync problems, lots of Windows apps/games are written in a way that if you don't handle it this way you end up with exhausted resources.
      If the solution allows it, you can reduce the cost for example by recycling the wait list entries (thus avoiding the potential syscall and even the library call). Especially since the number of entries used is usually rather small. For permanent/long-term subscriptions, you don't even need to do that. (In my use cases, all are long term).

      However that wasn't done in that implementation, and I'm not sure sure you can easily recycle condition variables or whatever they use.

      I think with esync there may be an upper limit globally on how many you can use at the same time. For wait list entries, there is no upper limit at least if you implement them outside the kernel in user space. I think at least one version of patched futexes used (or still uses) an upper limit of 64 or 65 per thread since it wanted to use a fixed allocation inside the kernel and Windows has that limit. One of my use cases is about 100 events waited for by a single thread, so I wouldn't even be able to use the patched futexes the way they are meant to be used.
      Last edited by indepe; 04 June 2021, 04:55 PM.

      Comment


      • #43
        Originally posted by indepe View Post

        If the solution allows it, you can reduce the cost by recycling the wait list entries (thus avoiding the potential syscall and even the library call). Especially since the number of entries used is usually rather small. For permanent/long-term subscriptions, you don't even need to do that. (In my use cases, all are long term).
        Yes but not for the WFMO API since the list can and will chang between each call and from what I've heard many games tend to really do crazy stuff here. Assuming long-term subscriptions leads to the esync problem of exhausted resources.

        For anyone interested Collabara did a presentation of futex2 and the old solution here

        Comment


        • #44
          Originally posted by indepe View Post
          I also like the idea behind FUTEX_SWAP, however it doesn't seem to be part of these patches.
          Yeah, which is annoying, it gives a lot of flexibility to userspace locks and even extremely fast interprocess communication.

          But of course the maintainers had to play tough when it was first sent and now the Google dev doesn't want to bother anymore, since they probably use it internally at Google. zzz

          Comment


          • #45
            Originally posted by F.Ultra View Post

            Yes but not for the WFMO API since the list can and will chang between each call and from what I've heard many games tend to really do crazy stuff here. Assuming long-term subscriptions leads to the esync problem of exhausted resources.
            Unfortuantely I edited my post a few times, I'm not sure if you saw the latest version.

            In the implementation that I am proposing you can recycle the wait list entries even if the number changes between each WFMO call. You can just keep the max number you used. There is no upper limit in that sense since they are just small amounts of memory each. It could be many millions.

            As I said above, I have use cases with about 100 for a single thread, and also use cases where the number is usually smaller but actually unlimited, and there would be a problem if I were to have a fixed limit like Windows (and maybe futex2) with 64/65 per thread.

            Comment


            • #46
              Originally posted by indepe View Post
              As I said above, I have use cases with about 100 for a single thread, and also use cases where the number is usually smaller but actually unlimited, and there would be a problem if I were to have a fixed limit like Windows (and maybe futex2) with 64/65 per thread.
              Is this in "latency sensitive code" I would guess not. Remember WaitForMultipleObjects is used in game cheat engines this is code that is highly latency sensitive.

              Also you have games that will creating thousands to millions of WaitForMultipleObjects that they may garbage collect in the distant future this is what has broken a lot of wine attempted solutions. The record for a working 64 bit game to have created under windows is 1 billion consuming over 4G of ram in locking data under windows. Yes that game had well over blown ram requirements due to crap locking.

              If you do watch the youtube video on you will find the min rate you need your locking to cope with.

              Windows game engines can be performing locking operations at a scary rate of over 42000 locks per second. Yes this is why a leak in this department goes to insane quickly. Yes framerate are like 240 frame per second is nothing. The reality is a 4 percent saving will allow more game logic operations to complete per frame this include doing extra optional checking for cheating.

              Yes the video does note no frame rate gains. Games have a lot of optional things they can run. Some of those things not running either get you banned or even in single player cause the game to crash.

              64-65 per thread will not cut it when you look at how current day windows game engines work.

              Ps do note the upstreaming of futex2 is happening in the real-time tree first because the locking here is very latency sensitive form of locking that is required.
              Last edited by oiaohm; 04 June 2021, 09:53 PM.

              Comment


              • #47
                Originally posted by oiaohm View Post
                Windows game engines can be performing locking operations at a scary rate of over 42000 locks per second.
                The number of 42,000 refers to futex operations, not locks, and I have already discussed that number.

                As usual when it comes to multithreading, you only have a vague idea of what you are talking about, and it is an endless endeavor to try get on the same page with you. I currently don't have the time to spend on that sisyphus work.

                Originally posted by oiaohm View Post
                64-65 per thread will not cut it when you look at how current day windows game engines work.
                This only supports what I said already.
                Last edited by indepe; 04 June 2021, 10:28 PM.

                Comment


                • #48
                  Originally posted by indepe View Post
                  The number of 42,000 refers to futex operations, not locks, and I have already discussed that number.
                  Please note we only have solid count on the futex syscalls. In fact that is not all the futex operations. The futex operations that are solved by atomic operations do not cause the syscall this are contended lock or lock create or lock delete that trigger syscall.

                  The case documented with wine when they went away from using file based sync was over 30000 locks for a single thread and over 1500 wait multi.


                  indepe this is a intel tutorial go read "Table 5.7: Multi-threaded Main Thread Render Function". So you have the game engine mainloop creating more and more
                  WaitForMultipleObjects. So you have at least as many WaitForMultipleObjects calls as the frame rate in modern day multi threaded game engines. Of course this gets worse as this is used in more places.

                  Yes that intel example lines up with the Microsoft example.

                  WaitForMultipleObjects is massive used. Yes creating a new set of locks every frame and terminating all those locks at end of frame is what game engines are doing. Of course then some of those are leaked then garbage collected by the game engine code instead of fixing them properly.

                  Game engines are horrible.

                  Comment


                  • #49
                    Originally posted by oiaohm View Post

                    Please note we only have solid count on the futex syscalls. In fact that is not all the futex operations. The futex operations that are solved by atomic operations do not cause the syscall this are contended lock or lock create or lock delete that trigger syscall.
                    Yes and no. It is correct that only slow paths cause futex syscalls, but all futex operations are syscalls. The fast path is not a futex operation, it is just an atomic variable operation, unless you call the atomic variable part of the futex. However I don't think that is how it was counted. If it was, then the difference between futex1 and futex2 would be even less significant.

                    And futexes are not only used by locks. Also, for example, by semaphores, WaitForSingleObject, and WaitForMultipleObjects in that WINE test using futexes. And I guess by condition variables.

                    Originally posted by oiaohm View Post
                    The case documented with wine when they went away from using file based sync was over 30000 locks for a single thread and over 1500 wait multi.
                    I don't know anything about that number and where it comes from, but it doesn't surprise me. It doesn't sound like a problem, but that may depend on the time spent inside the critical section (the general term, not the Windows API). Probably not all calls are on the same lock, so unlikely to be a problem in that sense as well.

                    Originally posted by oiaohm View Post
                    https://software.intel.com/content/w...12-part-5.html
                    indepe this is a intel tutorial go read "Table 5.7: Multi-threaded Main Thread Render Function". So you have the game engine mainloop creating more and more
                    WaitForMultipleObjects. So you have at least as many WaitForMultipleObjects calls as the frame rate in modern day multi threaded game engines. Of course this gets worse as this is used in more places.
                    https://github.com/microsoft/DirectX...ading.cpp#L741
                    Yes that intel example lines up with the Microsoft example.
                    Having as many WaitForMultipleObjects calls as the frame rate should not be a problem at all, but I don't know anything about the Windows performance of that call. (My own use cases are more frequent than that.)

                    Sorry I currently don't have the time to follow the links or to give you more detailed answers.
                    Last edited by indepe; 04 June 2021, 11:47 PM.

                    Comment


                    • #50
                      Originally posted by oiaohm View Post
                      The case documented with wine when they went away from using file based sync was over 30000 locks for a single thread and over 1500 wait multi.
                      By the way, without contention, a Zen2 processor can perform more than 100 million lock operations per second (with everything in the cache).

                      So any problem would depend on contention and other additional factors.

                      Comment

                      Working...
                      X