Announcement

Collapse
No announcement yet.

Linux Adding New Control Since Its Splitlock Detector Is Wrecking Some Steam Play Games

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by milkylainen View Post

    Ack on both counts. Just needed a clarification.
    It would be interesting to know if a Windows kernel does something differently or try to work around this using transactions or similar.
    Maybe the binaries are built with a tsx lock library (like the one from intel?) and disabling tsx under Linux due to the complexities and broken behavior yields "normal" split lock behavior?
    It's not just Linux that disables TSX-NI on some generations of Intel CPUs. It's disabled via microcode in Skylake and Coffee Lake CPUs, too because of recent side channel attacks that have been exposed. So if it's a matter of no usable TSX falling back to old behavior, then the performance regression is going to affect any OS, including Windows. So the key to fixing this is to try to demonstrate there's a big performance hit on supported systems. Otherwise, they have every right to say "works fine on Windows" and blow you off. How you'd prove there's a performance hit on Windows would depend on whether not there's any way to actually re-enable TSX-NI on affected processors then do A/B testing: on/off. They may still blow you off. It may be a non-trivial fix even if there's a demonstrable problem (and assuming TSX is the problem).

    Add for a little more clarity: A/B testing on both Linux and Windows. If there's a way to re-enable TSX-NI in Linux, test with it on, off, and with the splitlock detector on and off. Compare results. Then if there's demonstrable issues, try it again with Windows TSX-NI on/off. However, not all CPUs have TSX to begin with, so... dunno. This may actually be entirely normal way of doing things in Windows because the way Windows handles locking behavior is entirely different so this might be a side effect of trying to run Windows software on Linux where expectations are different (not necessarily better, just different).
    Last edited by stormcrow; 14 December 2022, 12:37 PM.

    Comment


    • #32
      Originally posted by mdedetrich View Post
      As far as I understand, the spin lock mechanism behaves very differently on Windows
      Not SPIN lock, but SPLIT lock. You're thinking of spin locks -- totally different issue.

      A split lock is when the CPU tries to do an atomic operation that crosses cacheline boundaries. Modern CPUs are optimized for doing atomics within a single cacheline, and some ISAs treat split-atomics as an error (probably generating a SIGBUS, on Linux).

      For x86, since it was previously allowed, they have to support it. But, because it involves locking the memory/cache bus of the entire CPU, it's incredibly costly. The kernel patch was intended to help people optimize their code by finding & fixing these split locks, but it presumes access to the source or a vendor who still supports the software on Linux (if they ever did).

      If you use C++ atomics (maybe also C, I'm not sure), you needn't worry about this. They take care that the object has natural alignment, I believe. It's mostly an issue for legacy code.

      Comment


      • #33
        Originally posted by milkylainen View Post
        It would be interesting to know if a Windows kernel does something differently or try to work around this using transactions or similar.
        You can't. It's literally a CPU/hardware thing. The CPU (optionally?) generates a trap, when they happen, but by that point the damage is already done (in terms of the CPU's entire bus having been locked).

        There are only a few options:
        1. Ignore it.
        2. Log it.
        3. Log it with an extra sleep, in hopes of grabbing someone's attention.
        4. Terminate the program.

        Originally posted by stormcrow View Post
        How you'd prove there's a performance hit on Windows would depend on whether not there's any way to actually re-enable TSX-NI on affected processors then do A/B testing: on/off.
        No, it's nothing to do with TSX.​
        Last edited by coder; 14 December 2022, 01:29 PM.

        Comment


        • #34
          Wow - deliberately screwing up a computer is not something Kernel developers should be doing.

          Let's look at how Microsoft handles this sort of thing, shall we? Wait for it... Massive brain wave time... They flag this stuff when running debug builds of software... You know: for when people are developing the software?

          Comment


          • #35
            Originally posted by coder View Post
            Not SPIN lock, but SPLIT lock. You're thinking of spin locks -- totally different issue.

            A split lock is when the CPU tries to do an atomic operation that crosses cacheline boundaries. Modern CPUs are optimized for doing atomics within a single cacheline, and some ISAs treat split-atomics as an error (probably generating a SIGBUS, on Linux).

            For x86, since it was previously allowed, they have to support it. But, because it involves locking the memory/cache bus of the entire CPU, it's incredibly costly. The kernel patch was intended to help people optimize their code by finding & fixing these split locks, but it presumes access to the source or a vendor who still supports the software on Linux (if they ever did).

            If you use C++ atomics (maybe also C, I'm not sure), you needn't worry about this. They take care that the object has natural alignment, I believe. It's mostly an issue for legacy code.
            Oooh, thanks for explaining.

            Comment


            • #36
              Originally posted by OneTimeShot View Post
              Wow - deliberately screwing up a computer is not something Kernel developers should be doing.

              Let's look at how Microsoft handles this sort of thing, shall we? Wait for it... Massive brain wave time... They flag this stuff when running debug builds of software... You know: for when people are developing the software?
              You do realize that the main impetus for this change is software that has been written/compiled for Microsoft Windows (in the patch change notes they refer to Gears of War)? That game is no longer being developed/changed, so we are well past the phase of "looking at warnings in debug builds".

              Comment


              • #37
                Originally posted by mdedetrich View Post
                You do realize that the main impetus for this change is software that has been written/compiled for Microsoft Windows (in the patch change notes they refer to Gears of War)? That game is no longer being developed/changed, so we are well past the phase of "looking at warnings in debug builds".
                Yeah, and given that this same bug very likely also happens in the game on Windows, the fact that nobody noticed until Linux's sledgehammer smashed their FPS is actually a testament to Linux' approach vs. Windows'!

                The issue is just that Linux' approach has this annoying downside that simply knowing about a problem doesn't mean there's anything you can do about it.

                Comment


                • #38
                  Originally posted by coder View Post
                  You can't. It's literally a CPU/hardware thing. The CPU (optionally?) generates a trap, when they happen, but by that point the damage is already done (in terms of the CPU's entire bus having been locked).

                  There are only a few options:
                  1. Ignore it.
                  2. Log it.
                  3. Log it with an extra sleep, in hopes of grabbing someone's attention.
                  4. Terminate the program.
                  Could this be somewhat fixed with a binary patch like cracks and some mods do or is it too hard to do?

                  Comment


                  • #39
                    Originally posted by geearf View Post
                    Could this be somewhat fixed with a binary patch like cracks and some mods do or is it too hard to do?
                    Yeah, I was thinking about that. It depends on how many unique cases have these split locks and where they're being allocated.

                    What seems likely is that they're contained in a heap-allocated struct or class. In that case, you could try patching the code used to allocate it, increasing the size and adding an offset to the beginning. Hopefully, that won't break anything else. Then, when it's deallocated, you'd have to remove the offset when returning the memory to the heap. Probably easier said than done, especially if allocation/deallocation is inlined in multiple places.

                    If that doesn't work, or proves too difficult, then maybe you could just change the size & adjust the offset of the atomic member. However, that involves finding all of the code which uses it and adjusting the offset.

                    I doubt it's being stack-allocated, since I think most stack variables have natural alignment.

                    With all that being said, I've never tried to crack a game... I think it's likely the bug can be fixed by an experienced cracker, assuming the game's own DRM doesn't get in the way.

                    Comment


                    • #40
                      ”It's also possible we'll see this automatically adjusted with the likes of Feral's GameMode.”

                      Unless it has root privileges, that is not happening,

                      Comment

                      Working...
                      X