Announcement

Collapse
No announcement yet.

Windows 11 Better Than Linux Right Now For Intel Alder Lake Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by agd5f View Post

    I'm not sure what you are asking. Linux and windows take different approaches to CPU clock management in my understanding. Both OSes still give hints to the the hardware. CPPC (the underlying interface to control the CPU clocks) was designed for windows and seems to work well there. Linux seems to be more hands on while windows less so. I suspect windows gives target hints at a pretty coarse level and then lets the hardware go (here's my target performance, with that in mind, get it done as fast as possible) while Linux seems to constantly be setting new targets for performance (more work coming online, lets try a slightly faster target, now there's less work, let's try and slow things down). With the old pstate APCI interface, there were only 3 states, so even if the OS was constantly setting new targets, you'd just end up snapping to the nearest state. CPPC gives you a continuum of performance states, so every time you set a new hint, you could potentially end up walking through a long continuum of frequencies. I'm certainly not an expert in this area, just seems that way from what I've seen.

    As stated by others on the thread, the desktop is not necessarily the main use case for Linux at this point. I suspect the current schedulers work well for embedded and server platforms. In those cases you might favor something more deterministic which it seems like the Linux schedulers strive for.
    Cpufreq conservative works like that, but I think its hysteresis band is pretty wide by default. If you move the thresholds close together, you can make it act as an integral-only controller, which actually works decently well in some workloads. The schedutil and ondemand governors, on the other hand, are proportional with some heuristics bolted on. "We have X% load on this core, so set frequency such that X would be 80% assuming constant IPC."

    The advantage of schedutil, in theory, is that it can preemptively change the frequency when a thread migrates between cores, without waiting for the hardware to figure it out.

    Intel pstate in HWP mode (only available on Skylake and later, I think) lets hardware decide.

    In general, my belief is that hardware can do a better job when you're trying to maximize performance constrained by power/temperature/current/voltage drop, but that software is better at minimizing the total task energy. It takes software to know whether the executing thread is a real-time game that wants 5 ms latency, a video decoder that has plenty of buffer and can get away with 200 ms, or a backup job that gets there when it gets there. Uclamp can do that, but it needs per-application tuning and unfortunately only Android seems to have the resources for to do that.

    Comment


    • #62
      Originally posted by phoronix View Post
      Phoronix: Windows 11 Better Than Linux Right Now For Intel Alder Lake Performance

      While we are used to running AMD and Intel benchmarks between Microsoft Windows and Linux while most often finding that our favorite open-source operating systems normally lead the race from desktops through HEDT and server platforms, when it comes to the Core i9 12900K "Alder Lake" that is currently not the case. Going into this round of Windows vs. Linux testing quite curious given some Intel hybrid architecture oddities we have been seeing under Linux, indeed when hitting Windows 11 and an assortment of Linux distributions with benchmarks we were left disappointed. Not only did Windows 11 come out faster overall, but related is now Linux also had much higher run-to-run variance due to the mix of P and E cores with Thread Director not making the wisest choices under Linux.

      https://www.phoronix.com/vr.php?view=30684
      Code:
      taskset --cpu-list 0-15 command arguments...

      Comment


      • #63
        Originally posted by atomsymbol View Post

        Code:
        taskset --cpu-list 0-15 command arguments...
        Doesn't do anything about kernel threads, though. If I had one of these chips, I'd rather:
        Code:
        for i in {16..23}; do echo 0 | sudo tee /sys/devices/system/cpu/cpu${i}/online; done
        At least until Intel gets their butts in gear.

        Comment


        • #64
          Originally posted by yump View Post

          Doesn't do anything about kernel threads, though. If I had one of these chips, I'd rather:
          Code:
          for i in {16..23}; do echo 0 | sudo tee /sys/devices/system/cpu/cpu${i}/online; done
          At least until Intel gets their butts in gear.
          I was trying to hint that a lot of sites are (explicitly or implicitly) claiming that "there are no existing solutions to AlderLake scheduling in Linux". The fact is, there are.

          Second example (better than /usr/bin/taskset):

          Code:
          A=/sys/fs/cgroup/cpuset/foo
          mkdir $A
          chown USER:USER -R $A
          echo 3-5 > $A/cpuset.cpus
          echo 0 > $A/cpuset.mems
          
          cgexec -g cpuset:foo stress-ng --cpu $(nproc)
          
          cgexec -g cpuset:foo <steam_game>
          redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpuset

          A problem that I encountered while using cgroups is that there appears to be no way to limit the number of IOPS for buffered writes to a harddisk.
          Last edited by atomsymbol; 15 November 2021, 09:09 PM. Reason: Add chown -R

          Comment


          • #65
            Originally posted by yump View Post

            Doesn't do anything about kernel threads, though. If I had one of these chips, I'd rather:
            Code:
            for i in {16..23}; do echo 0 | sudo tee /sys/devices/system/cpu/cpu${i}/online; done
            At least until Intel gets their butts in gear.
            Normally the kernel does the least amount of work, so that shouldn't be a big deal.

            Oh, another pain of Linux: in Windows there's just a single process called "System" which is easy to assess in terms of CPU time it uses while in Linux you've got a ton of kernel threads and estimating their load is quite difficult if not impossible for the naked eye.

            Comment


            • #66
              Of course in top you can choose the root user and sort by time but summing up all those things is near impossible.

              Code:
              top - 04:53:18 up  2:10,  0 users,  load average: 0.04, 0.22, 0.40
              Tasks: 316 total,   1 running, 315 sleeping,   0 stopped,   0 zombie
              %Cpu(s):  0.5 us,  0.2 sy,  0.0 ni, 99.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
              MiB Mem :  64242.0 total,  58883.4 free,   3145.6 used,   2213.0 buff/cache
              MiB Swap:      0.0 total,      0.0 free,      0.0 used.  59810.8 avail Mem
              
                  PID USER      PR  NI    VIRT    RES  %CPU  %MEM     TIME+ nTH  P COMMAND                            
                 2359 root      20       24.2g  78.7m   7.9   0.1  14:06.53   2  3 /usr/libexec/Xorg -background none+
                 2377 root     -51                      0.7         2:59.42   1  9 [irq/118-nvidia]                    
                 1798 root       0 -20                              1:08.02   1 15 [kworker/u33:0-hci0]                
                 1802 root       0 -20                              1:07.89   1  7 [kworker/u33:1-hci0]                
                 1933 root      20      309.9m   6.3m         0.0   0:19.45   5  9 /usr/sbin/rngd -f -x pkcs11 -x nist
                 1757 root      20                                  0:06.14   1  1 [nvidia-modeset/]                  
                   11 root      20                                  0:02.34   1  1 [rcu_preempt]                      
                13224 root      20                                  0:01.83   1  3 [kworker/3:1-events]                
                11174 root      20                                  0:01.34   1  0 [kworker/0:1-events]                
                35883 root      20                                  0:01.28   1 12 [kworker/12:0-events]              
                14163 root      20                                  0:01.21   1  8 [kworker/8:0-events]                
                45805 root      20                                  0:01.12   1  9 [kworker/9:2-events]                
                31295 root      20                                  0:01.09   1 14 [kworker/14:1-events]              
                 2379 root      20                                  0:01.06   1  1 [nv_queue]                          
                26279 root      20                                  0:00.98   1  5 [kworker/5:1-events]                
                40491 root      20                                  0:00.95   1 10 [kworker/10:0-events]              
                 1945 root      20      386.2m  16.0m         0.0   0:00.82   6  6 /usr/libexec/udisks2/udisksd        
                24286 root      20                                  0:00.79   1  6 [kworker/6:0-events]                
                    1 root      20      167.5m  15.8m         0.0   0:00.69   1  5 /sbin/init

              Comment


              • #67
                Originally posted by avem View Post
                Oh, another pain of Linux: in Windows there's just a single process called "System" which is easy to assess in terms of CPU time it uses while in Linux you've got a ton of kernel threads and estimating their load is quite difficult if not impossible for the naked eye.
                It's just the "sy" field in top?

                Comment


                • #68
                  Originally posted by yump View Post

                  It's just the "sy" field in top?
                  Might be, but I'm not so sure.

                  Comment

                  Working...
                  X