Announcement

Collapse
No announcement yet.

FUTEX2 Patches Sent Out In Simpler Form For Helping Windows Games On Linux

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by JoshuaAshton View Post

    Congratulations on explaining absolutely fucking nothing when asked how to do it another way.

    Look, if you're going to big up yourself on how to do it then maybe actually explain it instead of just explaining the problem space again and giving it all the this "It requires learning multi-threaded programming in many steps" shit.
    Apparently, you didn't make it to the fourth sentence. Just to be of service, I dug out one of the "previous discussions". There are more.

    Posts #16 to #40 between MadCatX and I.
    Note sometimes post numbers change, starts with "How do you this efficiently on Linux?" and ends with "Welcome!"

    Phoronix: FUTEX2 Linux Patches Updated To Support Variable-Sized Futexes One of the elusive kernel patch series we have been eager to see for the mainline Linux kernel has just been spun up a fourth time... https://www.phoronix.com/scan.php?page=news_item&px=Linux-FUTEX2-System-Call-v4


    Unfortunately format doesn't provide for easy reading. Maybe that's not the very best way to do it, and I'm not a teacher or a book author or anything in that direction.

    Comment


    • #42
      Originally posted by indepe View Post

      Apparently, you didn't make it to the fourth sentence. Just to be of service, I dug out one of the "previous discussions". There are more.

      Posts #16 to #40 between MadCatX and I.
      Note sometimes post numbers change, starts with "How do you this efficiently on Linux?" and ends with "Welcome!"

      Phoronix: FUTEX2 Linux Patches Updated To Support Variable-Sized Futexes One of the elusive kernel patch series we have been eager to see for the mainline Linux kernel has just been spun up a fourth time... https://www.phoronix.com/scan.php?page=news_item&px=Linux-FUTEX2-System-Call-v4


      Unfortunately format doesn't provide for easy reading. Maybe that's not the very best way to do it, and I'm not a teacher or a book author or anything in that direction.
      Why couldn't you just copy your wait list explanation here the past times I've asked instead of giving vague answers?

      The problem with that approach (as I understand it) is that you're going to need 1 thread per futex to wait and trigger the condition variables/other futexes from the waitlist and there are potentially tens or hundreds of thousands of them. Admittedly that thread is going to be asleep most of the time but ehhh. That seems problematic at scale.

      Am I missing something here?

      Comment


      • #43
        Originally posted by JoshuaAshton View Post
        Why couldn't you just copy your wait list explanation here the past times I've asked instead of giving vague answers?
        In my experience this usually requires a discussion (as it seems to do now) and I was hoping to avoid the need to repeat previous discussions.

        Originally posted by JoshuaAshton View Post
        The problem with that approach (as I understand it) is that you're going to need 1 thread per futex to wait and trigger the condition variables/other futexes from the waitlist and there are potentially tens or hundreds of thousands of them. Admittedly that thread is going to be asleep most of the time but ehhh. That seems problematic at scale.

        Am I missing something here?
        There are no additional threads, just the application threads themselves, those that call wait-multiple per application code. Each thread that needs to wait (if the events aren't there yet), waits on its own re-usable semaphore (just a small wrapper around a futex). When an event gets signaled, it looks up its wait list which has pointers to the semaphores of any thread(s) that is/are waiting on that event, and signals these semaphores. So if you have 10 events and 3 threads that wait for events, then there are 3 semaphores/futexes involved, not 10.

        Comment


        • #44
          Originally posted by indepe View Post
          As far as I can tell, very few people say explicitly what you just said. Among those who consider it may theoretically be possible, many assume the performance question has already settled the question of that not being an interesting alternative on Linux. However actual performance numbers that I have seen either compare esync and fsync, or come from micro-benchmarks comparing low-level futex functions pre- and post-patch.
          I've seen some benchmarks on real games (though not a ton).

          I think esync and fsync actually perform very similarly, at least from the limited tests I've seen.

          However they both provide a very large benefit vs the current WINE implementation.

          My understanding (which could be wrong) is that the esync solution was rejected by WINE for having unsolvable limitations, such as the potential to run out of file descriptors, which is why that's not considered a suitable long term solution.

          A lot of times when someone writes a library for their own use, they won't run into any such limitations in their own use cases, and so it's easy to handwave such things away and say they don't matter. But when you are implementing something that has to run proprietary 3rd party code which is often not coded well in the first place, sometimes you end up hitting more surprising behaviors and WINE still wants to handle all of them.

          Edit: here's a few tests:
          Bioshock 15% faster with esync or fsync vs none: https://flightlessmango.com/games/2596/logs/786
          10 popular games, very similar performance between esync and fsync: https://flightlessmango.com/benchmarks/Esync_vs_Fsync
          various games reported - Thief, Sleeping Dogs, Greedfall show massive gains for esync/fsync over none: https://github.com/ValveSoftware/Proton/issues/2966
          Last edited by smitty3268; 10 August 2021, 02:30 AM.

          Comment


          • #45
            Originally posted by smitty3268 View Post
            I think esync and fsync actually perform very similarly, at least from the limited tests I've seen.
            Originally posted by smitty3268 View Post
            Edit: here's a few tests:
            Bioshock 15% faster with esync or fsync vs none: https://flightlessmango.com/games/2596/logs/786
            10 popular games, very similar performance between esync and fsync: https://flightlessmango.com/benchmarks/Esync_vs_Fsync
            various games reported - Thief, Sleeping Dogs, Greedfall show massive gains for esync/fsync over none: https://github.com/ValveSoftware/Proton/issues/2966
            Interesting. Sometimes esync is even faster than fsync. And surprising: I didn't follow the game side that closely, but was under the impression that fsync is generally considered distinctly faster than esync, and that in terms of underlying technologies, futex is generally considered distinctly faster than eventfd. However any differences or non-differences could also because differences are at a too small level to appear in games at the higher level, or that they are balanced by differences in the implementation of fsync. However now I also found one benchmark where eventfd was actually faster, used between processes. So far, I didn't pay much attention to eventfd. In any case, I think in principle it wouldn't be difficult to replace semaphores/futexes with eventfd's in that implementation of mine the basics of which I have described above. May be worth making some benchmarks comparing the low-level difference.
            Last edited by indepe; 10 August 2021, 12:44 PM.

            Comment


            • #46
              Originally posted by indepe View Post
              Interesting. Sometimes esync is even faster than fsync. And surprising: I didn't follow the game side that closely, but was under the impression that fsync is generally considered distinctly faster than esync, and that in terms of underlying technologies, futex is generally considered distinctly faster than eventfd.
              At least some (maybe all) of those benchmarks are fairly old at this point, so it's possible that early versions of fsync weren't particularly well optimized compared to whatever the current state is. But it's enough to tell me that esync probably isn't that terrible overall, and by itself is a big increase vs having the current WINE implementation.

              Comment


              • #47
                Originally posted by indepe View Post

                Not just in a perfect world. I'm saying this information should have been given already because when someone proposes a new kernel feature, the burden of proof falls on them to provide adequate reasoning of why that kernel feature is needed or useful. I'm not sure if your argument here is that there already are a lot of kernel features that aren't useful, but in my opinion that would only make it worse and a larger problem.

                I read the links that are provided here, which include the provided patch comments, which (as far as I understand) are the designated location to provide reasoning in favor of a patch. As far I know these comments are supposed to be complete. I wouldn't know where else on LKML to look, but in the past I have also run a lot of internet searches on the topic, in addition to also reading the articles and discussions on LWN and watching another video by the author of the patches. And I have looked at a lot of things linked in previous discussions here. Even took a look at some of the source code of esync.

                Nobody should need to inform themselves more thoroughly than I have.
                That is because you are asking for more information than what is usually needed for kernel inclusion. The main question for the kernel devs are if the feature is useful and if that usefulness is worth the maintenance burden. The questions that you are asking are different and thus there does not exist any answers yet since they have not been asked in the proper venue, you are 100% free to post your questions/objections on LKML, the list is completely open and is the better venue for this discussion.

                Originally posted by indepe View Post
                Huhh??? Who is everyone? So maybe that's where we see things completely differently. Or maybe you missed the "(or not in a good way)" part.

                At first, many don't even know that there is another relevant possibility other than esync or fsync. They simply champion the kernel patches because fsync is better than esync. See the post following yours as an *apparent* example. And that's a large part of what I am concerned about. To a large part that also appears to be how it is presented.

                The question of what the patch submitter knows, and what "everyone" knows, are very different questions. Sometimes there is talk about an implementation using existing futexes, but those are vague enough to be misunderstandings. So I am not sure if such an implementation has been studied (and optimized) in any significant way, or if it even exists.

                As far as I can tell, very few people say explicitly what you just said. Among those who consider it may theoretically be possible, many assume the performance question has already settled the question of that not being an interesting alternative on Linux. However actual performance numbers that I have seen either compare esync and fsync, or come from micro-benchmarks comparing low-level futex functions pre- and post-patch.

                Then there are those who either strongly believe it is not possible to make it work well, if at all, and those who strongly believe(d) that Linux doesn't provide the facilities to do it with reasonable performance.

                At least that is roughly how I see the situation. No straw men here.
                Who here or on LKML have said that this is impossible to do without kernel patches? The mere existence of esync (which is a userspace solution and that works) is evidence that WFMO is fully possible to implement without further kernel patches. Not really sure what your real objection here is besides being contrarian?


                Originally posted by indepe View Post
                So you actually do question the performance and/or feasibility yourself?

                At first I thought when you say "100% in userspace", you probably mean: without kernel patches. But now that you talk about being "efficient outside the kernel", I am not so sure anymore. None of the solutions are 100% in userspace, they all use kernel API. (Although purely theoretically you could do it 100% in userspace if you replaced all sleep-waiting with spinning.).

                The non-patched futex version will manage per-event-object wait lists in userspace instead of in the kernel per futex, if that is what you mean. So?

                I don't know what you mean with "can we inform the scheduler". Is this a functional question or a performance question?

                Will we ever even see a userspace library, in glibc or otherwise? That is a good question either way. Certainly WINE would implement its own, considering the odd features of the Windows API.
                FFS of course I'm talking about without additional kernel patches with a 100% userspace solution. How on earth have you already forgotten the context of your own arguments?

                And here you are again with the "feasibility" even though I spent a long paragraph arguing that no one is arguing that this is not possible to do without kernel patches, sheesh.


                Originally posted by indepe View Post
                futex_waitv() is not something "that every one can use immediately". futex_waitv() still expects that the fastpath is implemented in userspace, and applications would not use it directly without userspace code in between (library or not). Especially if you use several of the features as used in WINE. And it still expects you to implement event objects and all that. Even more complex if you want use it between processes.

                Furthermore I think that a direct use of futex_waitv() or similar would not be a good API for permanent subscriptions, or when used in a loop. For events that are not consumed by single threads, but are meant to wake all waiting threads, this API may cause a simple loop to miss events when the event occurs while processing one event and before the next wait is called. Solving that problem requires more userspace code and/or a better API. That's why epoll doesn't just have epoll_wait but also epoll_create with epoll_ctl.
                You will miss events only if you reset the events outside of futex_waitv() but then you have the same problem with a pure futex, or WFMO. The fastpath is handled by futex_waitv() (which of course means that you waste a cs for the call, but the support is there).

                You can use futex_waitv immediately once you run a kernel that have the patch, all you need is the defines and they you can code for it just like every other single syscall. This is just getting silly at this point.

                Comment


                • #48
                  Originally posted by F.Ultra View Post
                  That is because you are asking for more information than what is usually needed for kernel inclusion. The main question for the kernel devs are if the feature is useful and if that usefulness is worth the maintenance burden. The questions that you are asking are different and thus there does not exist any answers yet since they have not been asked in the proper venue, you are 100% free to post your questions/objections on LKML, the list is completely open and is the better venue for this discussion.
                  I am not merely asking questions. I am also establishing that the required information has not been provided by pointing out that nobody has the answer to these questions.

                  It is not enough to say that a patch is useful because there is a use case, when the situation is that the kernel already has a feature that satisfies that use case. Then the question becomes why the existing kernel feature isn't sufficient for that use case. And the burden of proof falls upon the patcher.

                  If only the use case is offered as reasoning for the usefulness of a patch, then the implication is that no other kernel feature exists that could satisfactorily be used.

                  The question falls into two parts:
                  1) Has the information been provided where it should be? The answer is: apparently not, and that is the main point.
                  2) Does this information exist anywhere else? The answer is: I don't know, but I'm not the only one who doesn't know. If it does exist, I would like to know.

                  What exactly do you mean with "on LKML"? Does that refer to the comments on the patches that are linked in this article and in previous articles, where sometimes kernel engineers respond? Or something else? If the former, then it is (currently) not my intent to consume time from the kernel engineers, considering how long these discussions often get. This is the only place I know that supports discussions of this length.

                  Originally posted by F.Ultra View Post
                  Who here or on LKML have said that this is impossible to do without kernel patches? The mere existence of esync (which is a userspace solution and that works) is evidence that WFMO is fully possible to implement without further kernel patches. Not really sure what your real objection here is besides being contrarian?
                  I'm not sure if anyone has said: completely "impossible". At least one came pretty close, though. The main question is if WINE/Proton requires kernel patches to get proper performance for games. And a secondary question is if a solution using the existing futex API is possible. There the question is not so much if it is possible in a merely theoretical sense, but if it is possibly a viable solution.

                  Esync: No, esync is not "fully possible" in the sense of being a fully viable solution. One problem is that it is limited by the OS limit for the number of file handles, and according to the esync engineer, that is an even bigger problem than it would be anyway, because of leaks in some Windows apps. (I think this refers to leaks of event objects and that in esync each event object uses one eventfd.) (See also post #44 by @smitty3268.)

                  What other solution is known to exist (that hasn't already been dismissed) ?


                  Originally posted by F.Ultra View Post
                  FFS of course I'm talking about without additional kernel patches with a 100% userspace solution. How on earth have you already forgotten the context of your own arguments?

                  And here you are again with the "feasibility" even though I spent a long paragraph arguing that no one is arguing that this is not possible to do without kernel patches, sheesh.
                  I don't get your point. This is not a computer science discussion of theoretical possibilities, this is about candidate solutions for WINE to run games (or other apps using WFMO) with good performance.


                  Originally posted by F.Ultra View Post
                  You will miss events only if you reset the events outside of futex_waitv() but then you have the same problem with a pure futex, or WFMO. The fastpath is handled by futex_waitv() (which of course means that you waste a cs for the call, but the support is there).
                  Often events just "occur" and don't have two states. Windows distinguishes "auto-reset" and "manual-reset" and also has a function called PulseEvent. Yet still, yes, WFMO has the same problem. Which is one of the reasons why I think WFMO would be a bad API for Linux to have. I think it should remain internal to WINE.

                  Originally posted by F.Ultra View Post
                  You can use futex_waitv immediately once you run a kernel that have the patch, all you need is the defines and they you can code for it just like every other single syscall. This is just getting silly at this point.
                  What for? I don't see any use case where I wouldn't prefer to use the existing futex API. I don't see a serious application getting anything out of it that it can't get with the existing futex API. And I don't see a lot of casual programmer using that API directly as a syscall, especially not without quite a risk to run into problems they don't even understand (and then possibly blame the kernel for).
                  Last edited by indepe; 11 August 2021, 04:59 PM.

                  Comment


                  • #49
                    Originally posted by indepe View Post
                    What exactly do you mean with "on LKML"? Does that refer to the comments on the patches that are linked in this article and in previous articles, where sometimes kernel engineers respond? Or something else? If the former, then it is (currently) not my intent to consume time from the kernel engineers, considering how long these discussions often get. This is the only place I know that supports discussions of this length.
                    Not sometimes, LKML, aka the Linux Kernel Mailing List, is the only place where the discussion of patches matters, you are not consuming the time of kernel engineers except if you spam or troll the list. If you have objections, questions, ideas for improvement you not only can but should post on LKML. I do so frequently on matters that concern me, very few kernel devs go to places like this for discussions so if you want to have the possibility to make a change that is the proper place to do it.

                    Originally posted by indepe View Post
                    I don't get your point. This is not a computer science discussion of theoretical possibilities, this is about candidate solutions for WINE to run games (or other apps using WFMO) with good performance.
                    The point is that you interpreted what I wrote in a different context than what the discussion was about. But yes I can understand how "100% userspace" might be misinterpreted so I will try to make sure to be extra specific from now on.


                    Originally posted by indepe View Post
                    Often events just "occur" and don't have two states. Windows distinguishes "auto-reset" and "manual-reset" and also has a function called PulseEvent. Yet still, yes, WFMO has the same problem. Which is one of the reasons why I think WFMO would be a bad API for Linux to have. I think it should remain internal to WINE.
                    Which is why this is not WFMO for Linux, it's a building block for implementing WFMO inside WINE but it's not for implementing WFMO in Linux. Naturally futex_waitv() is just as sensitive to the exact same use semantics as the basic futex(), my point here is that since the use semantics does not change your argument here against futex_waitv() is moot.

                    Originally posted by indepe View Post
                    What for? I don't see any use case where I wouldn't prefer to use the existing futex API. I don't see a serious application getting anything out of it that it can't get with the existing futex API. And I don't see a lot of casual programmer using that API directly as a syscall, especially not without quite a risk to run into problems they don't even understand (and then possibly blame the kernel for).
                    What for? For performing efficient wait on multiple futexes (up to 128 in the current patch). Something that have been discussed on lkml since futexes where introduced in v2.5.

                    The casual programmer would experience the exact same problems with the existing futex() call so this argument is also moot.

                    Comment


                    • #50
                      Originally posted by indepe View Post



                      Interesting. Sometimes esync is even faster than fsync. And surprising: I didn't follow the game side that closely, but was under the impression that fsync is generally considered distinctly faster than esync, and that in terms of underlying technologies, futex is generally considered distinctly faster than eventfd. However any differences or non-differences could also because differences are at a too small level to appear in games at the higher level, or that they are balanced by differences in the implementation of fsync. However now I also found one benchmark where eventfd was actually faster, used between processes. So far, I didn't pay much attention to eventfd. In any case, I think in principle it wouldn't be difficult to replace semaphores/futexes with eventfd's in that implementation of mine the basics of which I have described above. May be worth making some benchmarks comparing the low-level difference.
                      That's because it's benchmarking a full game and not esync vs fsync. It's quite possible that a game will perform worse with a more efficient event notification framework, would not be the first time an optimization leads to worse performance due to bad usage patterns.

                      By the very nature of how they are implemented there should be zero instances of eventfd being faster than futex, any benchmark providing such "evidence" must contain a bug somewhere. In fact futexes should be several orders of magnitude faster than eventfd under any circumstance.

                      Comment

                      Working...
                      X