Announcement

Collapse
No announcement yet.

FUTEX2 Spun Up A Fifth Time For This Linux Interface To Help Windows Games

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by coder View Post
    Be that as it may, WaitForMultipleObjects() has been around since at least Windows 95, and had nothing to do with games. Wine might be adding it for games, but it wasn't in Windows primarily for games.
    Wine has had it for a long time. It has also been a source of slow Wine performance for a long time because it traditionally required IPC with the wine server. The wine developers reimplemented it to use eventfd (esync) to improve thread synchronization in wine for Proton to a level that is close to native Windows performance, but they want to reimplement it using futex2 to get thread synchronization to match native Windows performance. At least, that is my understanding.

    Comment


    • #22

      Originally posted by F.Ultra View Post
      Could also be argued that in an environment where you do have access to WFMO then people tend to design their code around it leading to IMHO an abundant usage of multiple object when fewer might have served the same purpose. Looking that the situations that WINE is trying to solve here with games creating, waiting for and disposing millions of such objects then IMHO this scream inefficient design (of the games, not of WFMO).
      I have yet to see an example of good code that needs WaitForMultipleObjects. I suspect that WaitForMultipleObjects was not used on Windows because it is good, but because the locking model was focused on interprocess locks rather than intraprocess locks. Microsoft introduced POSIX-like SRW locks that they encourage developers to use for better performance.

      Originally posted by F.Ultra View Post
      What major optimizations for games have WIndows done (ignoring DIrectX since that is a special case)? I have to admit that I'm a bit rusty coding for Windows now a days but I cannot think of anything major changed in this regard the last decades?!
      I would like to know this too.

      Here is what I know offhand, not excluding DirectX, as it is a large collection of APIs and the list becomes really short without it:

      * SRW Locks
      * Miscellaneous scheduler improvements
      * DXIL's elimination of libraries/modules, which allows better shader compiler output via interprocedural optimizations that could not be done between libraries. They seem to have taken a stepbackward by adding experimental library support to DXIL1.1. The elimination of libraries is basically copied from Vulkan (or was it Mantle?).
      * Supporting the latest initiatives by GPU designers such as Resizable BAR Support, Mesh Shading, Variable Rate Shading, Sampler Feedback, etcetera.
      * Not forcing NVMe to use a queue depth of 1 like their initial NVMe support did.

      It is not a very impressive list, although it is not comprehensive either. It is just a bunch of miscellaneous things. I am not sure if some of them count as major things.
      Last edited by ryao; 10 July 2021, 10:30 AM.

      Comment


      • #23
        Originally posted by F.Ultra View Post

        What major optimizations for games have WIndows done (ignoring DIrectX since that is a special case)? I have to admit that I'm a bit rusty coding for Windows now a days but I cannot think of anything major changed in this regard the last decades?!
        DirectStorage, it is some amazing technology used in Xbox Series X and S to load data from the solid-state disk directly into the graphics memory.

        Comment


        • #24
          Originally posted by F.Ultra View Post
          What major optimizations for games have WIndows done (ignoring DIrectX since that is a special case)?
          I'm no Windows expert, but I'd say pulling GPU drivers into userspace (or, at least out of ring 0) was a pretty big one.

          I'm sure there are tons of other things Windows did to optimize for games. The main question I have is which of them were bad for security or other types of apps? There was that whole thing people ran into porting games to Stadia with userspace spinlocks that didn't work so well, in Linux -- how much did Windows have to gimp their scheduling code to make crap like that work well?

          Comment


          • #25
            Originally posted by uid313 View Post
            DirectStorage, it is some amazing technology used in Xbox Series X and S to load data from the solid-state disk directly into the graphics memory.
            I think it's overhyped for stuff like games, but I have no data to back that up.

            Comment


            • #26
              Originally posted by coder View Post
              I'm no Windows expert, but I'd say pulling GPU drivers into userspace (or, at least out of ring 0) was a pretty big one.
              How did that improve games? AFAIK they put it in ring 0 for performance reasons back in the day and only decided to move it to userspace in Vista as to not have the whole system go down with the drivers, and AFAIK games where never considered when they made this change.

              Originally posted by coder View Post
              I'm sure there are tons of other things Windows did to optimize for games. The main question I have is which of them were bad for security or other types of apps? There was that whole thing people ran into porting games to Stadia with userspace spinlocks that didn't work so well, in Linux -- how much did Windows have to gimp their scheduling code to make crap like that work well?
              Well too me that is the other way around, i.e game developers on Windows gets used to how the scheduler works in Windows so they code their games to be extremely dependent on this specific behaviour and then gets angry when Linux does not have this behaviour. Again I don't think that Microsoft had games in mind when they decided on how their scheduler should behave with e.g spinlocks.

              Windows games codes for multi core seems to include a lot of sleep(0) and SwitchToThread() which reminds me of how we used to code back in the 8-bit days with NOP:s at certain places to time things exactly to the correct scan line.

              edit: and to answer your "how much did Windows have to gimp their scheduling code to make crap like that work well?" well to make sched_yield() work the way that Windows dev expected it to behave the Windows scheduler needs to have a single run-queue for the whole system and have sched_yield() put the current thread at the back of the queue and switch scheduling to the thread at the front of the queue.
              Last edited by F.Ultra; 11 July 2021, 09:39 AM.

              Comment


              • #27
                Originally posted by F.Ultra View Post
                How did that improve games? AFAIK they put it in ring 0 for performance reasons back in the day and only decided to move it to userspace in Vista as to not have the whole system go down with the drivers,
                You're talking about UMDF, while I'm talking about probably WDM. In the late 90's, they moved GPU drivers out of ring 0 so that draw calls wouldn't require a context switch. However, that resulted in much worse stability, and I'm guessing the result was a refactoring in the form of UMDF.

                Originally posted by F.Ultra View Post
                I don't think that Microsoft had games in mind when they decided on how their scheduler should behave
                I'm sure it did. Maybe not initially, but certainly guiding how it evolved over the years. Gaming was very high-profile, in Microsoft, since Windows 95. Aside from server loads, it's been one of the main stress cases for Windows and foremost of the applications continually pushing it forward.

                Comment


                • #28
                  Originally posted by coder View Post
                  I'm sure it did. Maybe not initially, but certainly guiding how it evolved over the years. Gaming was very high-profile, in Microsoft, since Windows 95. Aside from server loads, it's been one of the main stress cases for Windows and foremost of the applications continually pushing it forward.
                  As far as I remember the scheduler worked this way in Windows NT3.1 but on the other hand I would not imply that I'm an export in anything Windows so I'm probably wrong.

                  Comment


                  • #29
                    Originally posted by coder View Post
                    There was that whole thing people ran into porting games to Stadia with userspace spinlocks that didn't work so well, in Linux -- how much did Windows have to gimp their scheduling code to make crap like that work well?
                    If you are referring to the blog post and benchmarks by "ProbablyDance" that was discussed here a while ago, that was more complicated.

                    The performance of a variety of spinlocks varied on Linux vs WIndows. Depending, there were results on Linux that were better, almost equal, and worse.

                    The two locks which had the best average performance ("std::mutex" and "spinlock") actually had much better latencies on Linux.

                    The one spinlock that did much worse on Linux was the initial badly designed "terrible_spinlock" that also had exceptionally bad results on WIndows. In that sense it was an outlier that shouldn't exist in the first place. (The one that used neither pause nor yield.) (BTW, none of the spinlocks had a really good so-called back off algorithm.)

                    However that blog introduced an additional metric for analysis, which was supposed to find the reason for the intial problem, called "idle times". And that metric was ill-conceived and did not measure what it was meant to measure, but made Linux look bad in spite of not being a meaningful value in the first place. The (worst-case) latencies were already much better measured by the "waits" times.

                    Originally posted by F.Ultra View Post
                    edit: and to answer your "how much did Windows have to gimp their scheduling code to make crap like that work well?" well to make sched_yield() work the way that Windows dev expected it to behave the Windows scheduler needs to have a single run-queue for the whole system and have sched_yield() put the current thread at the back of the queue and switch scheduling to the thread at the front of the queue.
                    If you are indeed referring to the benchmarks above, there were two that used yield(): "spinlock" and "ticket_spinlock".

                    "spinlock" actually had much better latencies on Linux, whereas "ticket_spinlock" had much better latencies on Windows. (I wonder what the results of a really good benchmark would have been.)

                    All this got very confused by the dominant treatment of the ficticious "idle times", which made Windows look better than it deserved. However the whole test and especially the lock implementations where not really done by experts, so all of this needs to be taken in context altogether.

                    Comment


                    • #30
                      Originally posted by indepe View Post
                      If you are referring to the blog post and benchmarks by "ProbablyDance" that was discussed here a while ago, that was more complicated.
                      It really wasn't. I read all of Malte Skarupke & Linus' posts on the RWT forums (as well as the original blog posts with more details) and it basically comes down to the fact that userspace spinlocks are just a bad idea. Also, sched_yield() wasn't equivalent to Sleep(0) or whatever other idiom windows games used to force a context switch.

                      These are actually very informative posts. Here's his first reply. I'd encourage you to read it & the others by him (scroll to the bottom to see the list):

                      Originally posted by indepe View Post
                      The performance of a variety of spinlocks varied on Linux vs WIndows. Depending, there were results on Linux that were better, almost equal, and worse.
                      That's completely happenstance! It's highly subject to the specific system under test and what else is going on!

                      Comment

                      Working...
                      X