Linux 6.12 Released With Real-Time Capabilities, Sched_Ext, More AMD RDNA4 & More

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • skeevy420
    Senior Member
    • May 2017
    • 8707

    #41
    Originally posted by jabl View Post

    The one with the highest priority, of course (assuming by "most real time" you mean "gets first dibs at running").
    Yep. When wine.exe, game.exe, and more are all set to use real time they can potentially fight and cause the problem that RT was expected to solve.

    That being said, the real-time scheduling classes are a bit special, they're not some magic wroom-go-faster thing you can slap on some random process and expect things to go faster in general.
    That's why, IMHO, the more interactive and graphical the system, the less it should be using real time and, instead, use niceness and process schedulers. Set it up so the game and Wine run with -15 and let CFS, BFQ, SCX, or ??? handle it.

    Comment

    • dimko
      Senior Member
      • Dec 2009
      • 932

      #42
      Originally posted by mdedetrich View Post

      This kind of behaviour is normal for RT kernels (although I don't know if its meant to be this extreme). People think RT is "magic" that doesn't have any drawbacks, but RT actually hurts raw performance. Generally speaking there is a tradeoff between latency (especially if its meant to be deterministic, i.e. having a roof which is the highest amount of latency expected) versus raw performance.

      And games are all about raw performance, the last thing you want to have on a gaming machine is RT because the actual real time process'es can then interrupt the game process whenever they need a time slice from the CPU because they need to deterministically perform a task within a certain time window. Thats what RT is.

      For a gaming machine you want the main gaming process to hog the resources from everything else, fundamentally opposed to RT.
      I would argue that i set up my games SPECIFICALLY so THEY get all cpu time + pulseaudio + dbus. Everything else can go F itself.

      Comment

      • dimko
        Senior Member
        • Dec 2009
        • 932

        #43
        Originally posted by F.Ultra View Post

        RT does barely nothing for those cases. Priority handling is the same with and without RT, a major change a while back is that nice values are only relative to other processes in the same autogroup so even if you set a higher priority to some task by giving it a less nice value, that nice/priority will only be relative to the other processes in the same autogroup and not against other processes in the system. Normally all processes launched from say the DE (gnome/kde) will be launched in the same autogroup but it you start something from a terminal then that might get their own autogroup so that is something to look out for (aka you might believe that you change the priority of a process but in reality you didn't).

        The problem that you described in another post where you got small freezes every x seconds when running with RT enabled could very well be RT working as intended, aka at those intervals some other higher priority process wants to run and thanks to RT it could preempt your game inside that game processes quanta.
        Also I realized something.
        I suspect that CPU cache is being polluted by compilation. Basically i run FO4 while compiling firefox. Frame dropped bad. I CTRL Z it. FPS was still bad. I foregrounded process and killed it - frames immediately risen. Now if only there was a CPU cache control for Linux.

        Comment

        • dimko
          Senior Member
          • Dec 2009
          • 932

          #44
          Originally posted by caligula View Post

          Um have you ever looked at process listings when using Gnome? Here are two obvious issues. PA -> PW. Try some lightweight compositor. Also switch from Network Damager to systemd-networkd. Gnome uses like 1,5 gigabytes of RAM when started. It starts like 300 processes. I'm looking at my laptop now after a fresh boot. 99 tasks, 368 threads, 292 kernel threads, 1,8 GB of RAM used.
          My main issue of random frame drop was resolved. THanks for suggestions.

          Comment

          • skeevy420
            Senior Member
            • May 2017
            • 8707

            #45
            Originally posted by dimko View Post

            Also I realized something.
            I suspect that CPU cache is being polluted by compilation. Basically i run FO4 while compiling firefox. Frame dropped bad. I CTRL Z it. FPS was still bad. I foregrounded process and killed it - frames immediately risen. Now if only there was a CPU cache control for Linux.
            tasksel and limiting processes; that WINE_CPU_TOPOLOGY stuff I was going on about earlier. That's what this does for mixed X3D systems. You have to do the same thing manually for similarly grouped cores. Set games and compiles to use different sets of cores and/or CCDs.

            Comment

            • dimko
              Senior Member
              • Dec 2009
              • 932

              #46
              Originally posted by skeevy420 View Post

              I was exaggerating with the process count, but that point is valid nonetheless. When you have multiple real time processes, which is the most real time?

              Then you have the issue of adjacent systems that aren't set as real time. For example, the game is running faster with real time but the Bluetooth daemon is regular priority. Now you have potential input lag due to the game taking precedence over the system receiving input commands. That's not ideal.

              IMHO, you should try process isolation first and then real time second. I'm assuming that 32-core means 2 to 4 CCDs and the gaming lag inherent to that setup. Try limiting your game to 8 cores or less on a single CCD. RT or not, Multiple chiplets or not, not wrangling your game into the most optimal situation can cause gaming lag. Not having to do bullshit like that is why I game on a 7800X3D. The single CCD and only 8 cores means less wrangling of shit. At most I have to limit older Unity games, think KSP, to four cores.

              Even if it's not CCD latency related, I was bored and reading random Wine/Proton posts and discussions the other day and you'd be surprised how many games just quit working with high core CPUs. Some games are like "You have 28 cores? Something fucky is going on...I don't think I'm gonna run now." The older the game, the more likely that scenario will happen.

              If you're using Proton or a GE/TKG Wine you can set: "WINE_CPU_TOPOLOGY=8:0,1,2,3,4,5​,6,7". That'll limit your game to the first 8 cores.

              None of that covers potential game configuration issues. Bethesda games aren't very optimized by default. IIRC, Fallout 3/4/NV should be set to use 8 or less cores. 4 to 8 core CPUs were high end when they released. It's configured for single core by default. I assume you know this stuff, but just in case, you need to do some INI tweaks to set it up to use more cores. That video says to use your CPUs number of cores, but dude has a quad core. At most limit it to a single CCD, but I'd limit it to 8c/16t at most.

              Wait, do you mean you have a 16 core CPU, 32 threads with hyperthreading? A 32 core CPU should have 64 hyperthreads.
              3950X from AMD, 16 cores = 32 threads or whatever its called on AMD.

              I usually use cpulimit for core isolation on unity.

              With that said, do you have simple way of finding which cores are associated with which ccd?

              Comment

              • mrg666
                Senior Member
                • Mar 2023
                • 1100

                #47
                6.12 is running just fine for me with FC41. I did not use RT, still on preempt only. Other than that, there are a few new drivers. And that is all as far as I can see. Will stay on 6.12 ... until 6.13.

                Comment

                • skeevy420
                  Senior Member
                  • May 2017
                  • 8707

                  #48
                  Originally posted by dimko View Post

                  3950X from AMD, 16 cores = 32 threads or whatever its called on AMD.

                  I usually use cpulimit for core isolation on unity.

                  With that said, do you have simple way of finding which cores are associated with which ccd?
                  "lscpu" or "numactl -H"
                  Code:
                  numactl -H
                  available: 1 nodes (0)
                  node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
                  node 0 size: 63879 MB
                  node 0 free: 35763 MB
                  node distances:
                  node     0  
                    0:   10
                  ​
                  You can go one step further with this little snippet to align threads with each other
                  Code:
                  #!/bin/bash
                  
                  num_cpus=$(nproc) half_cpus=$((num_cpus / 2))
                  
                  for i in $(seq 0 $((half_cpus-1)))
                  do echo "CPU$i: $(cat /sys/devices/system/cpu/cpu$i/topology/thread_siblings_list)"
                  done​
                  Code:
                  CPU0: 0,8
                  CPU1: 1,9
                  CPU2: 2,10
                  CPU3: 3,11
                  CPU4: 4,12
                  CPU5: 5,13
                  CPU6: 6,14
                  CPU7: 7,15
                  All I have is the single CCD 7800X3D. I haven't had a multi-CPU or multi-chiplet-CPU system in almost 5 years now.

                  Comment

                  • Winnetou
                    Junior Member
                    • Nov 2024
                    • 1

                    #49
                    Originally posted by billyswong View Post

                    You can't reduce the power consumption for the same amount of work done without replacing the hardware (or inventing new algorithm to compute the same problem more efficiently), but you can reduce the power consumption per unit of time by lengthening the time used in doing the work, or reduce the amount of work (e.g. render lower quality graphics in gameplay)
                    I would like to interject. What you say that the power consumption per unit of work cannot be reduced without hardware changes is totally, absolutely BLATANTLY false. Objectively wrong. Factually incorrect. Unless you already had you computer configured to run in the most efficient way possible for that work.

                    Here's why it's so trivial to prove this: increasing frequency after the so-called efficiency window of a CPU uses more and more voltage, which means more and more power.
                    With examples like an 13/14900K needing double the power from 5.0 GHz to 6.0 GHz (not exact numbers, I put them to illustrate the idea). That means that you increase the frequency by 20% which would, in the perfect (read: never) scenario, reduce the time required to finish some work to 1/1.2 = 83.(3)%. Or 1/6 less aka 5/6 of the total time.
                    So if at 5.0 GHz it would use 100W and 1 hour to finish the task, that would consume 100Wh.
                    Which means that at 6.0 GHz, using the simplified numbers I put above, it would use 200W and finish in 60 minutes * 5/6 = 50 minutes. 200W * 5/6 = 166.(6)Wh.
                    Same hardware, same task, wildly different energy consumed.

                    In practice reducing the voltage will give you better efficiency, but
                    a) you cannot reduce it too much, as the CPUs have minimum frequencies (usually 800 MHz on modern Intel CPUs) and minimum voltages needed for those frequencies
                    and
                    b) the energy you consume isn't the CPU cores alone. For example DRAM itself uses energy too by simply being there, available. At some point the power needed for extra stuff like DRAM which doesn't scale that much or at all with the frequency might matter more than the power used by the CPU cores and in this situation going too slow would mean using more energy by, say, keeping the DRAW powered on for longer. So simply going to the minimum frequency isn't a bulletproof way of getting the most efficiency for a given task on the given hardware.

                    Comment

                    • jabl
                      Senior Member
                      • Nov 2011
                      • 650

                      #50
                      Originally posted by skeevy420 View Post
                      Yep. When wine.exe, game.exe, and more are all set to use real time they can potentially fight and cause the problem that RT was expected to solve.

                      That's why, IMHO, the more interactive and graphical the system, the less it should be using real time and, instead, use niceness and process schedulers. Set it up so the game and Wine run with -15 and let CFS, BFQ, SCX, or ??? handle it.
                      Sort of, but not really. RT is really meant for systems that are carefully designed up front for a deterministic (and fairly small) latency. And not compute bound; setting RT priority on a compute bound thread will likely make the system as a whole unusable, as it means the RT thread gets to run instead of a lot of other important things like various kernel threads etc. And RT scheduling is very simplistic compared to general purpose schedulers (CFS, EEVDF, etc.). There's no concept of fair share etc. The highest priority thread gets to run, end of story (SCHED_RR vs. SCHED_FIFO matters insofar as you have several threads with the same RT prio, and even then there's no fairness or such).

                      And it's fairly limited what a RT thread can do in order to avoid unpredictable latencies, regardless of RT scheduling priority. Call malloc()? Not RT. Disk I/O? Not RT. Etc. Typically a RT thread would be something that pokes at some HW, and then communicates with non-RT threads via some non-blocking queue.

                      As a result of this design, trying to make something like a game go faster or achieve less variability in frame rate by placing it in a RT scheduling class, will almost certainly backfire.

                      Comment

                      Working...
                      X