Announcement

Collapse
No announcement yet.

Windows 11 Better Than Linux Right Now For Intel Alder Lake Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • avem
    replied
    Originally posted by yump View Post

    It's just the "sy" field in top?
    Might be, but I'm not so sure.

    Leave a comment:


  • yump
    replied
    Originally posted by avem View Post
    Oh, another pain of Linux: in Windows there's just a single process called "System" which is easy to assess in terms of CPU time it uses while in Linux you've got a ton of kernel threads and estimating their load is quite difficult if not impossible for the naked eye.
    It's just the "sy" field in top?

    Leave a comment:


  • avem
    replied
    Of course in top you can choose the root user and sort by time but summing up all those things is near impossible.

    Code:
    top - 04:53:18 up  2:10,  0 users,  load average: 0.04, 0.22, 0.40
    Tasks: 316 total,   1 running, 315 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  0.5 us,  0.2 sy,  0.0 ni, 99.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    MiB Mem :  64242.0 total,  58883.4 free,   3145.6 used,   2213.0 buff/cache
    MiB Swap:      0.0 total,      0.0 free,      0.0 used.  59810.8 avail Mem
    
        PID USER      PR  NI    VIRT    RES  %CPU  %MEM     TIME+ nTH  P COMMAND                            
       2359 root      20       24.2g  78.7m   7.9   0.1  14:06.53   2  3 /usr/libexec/Xorg -background none+
       2377 root     -51                      0.7         2:59.42   1  9 [irq/118-nvidia]                    
       1798 root       0 -20                              1:08.02   1 15 [kworker/u33:0-hci0]                
       1802 root       0 -20                              1:07.89   1  7 [kworker/u33:1-hci0]                
       1933 root      20      309.9m   6.3m         0.0   0:19.45   5  9 /usr/sbin/rngd -f -x pkcs11 -x nist
       1757 root      20                                  0:06.14   1  1 [nvidia-modeset/]                  
         11 root      20                                  0:02.34   1  1 [rcu_preempt]                      
      13224 root      20                                  0:01.83   1  3 [kworker/3:1-events]                
      11174 root      20                                  0:01.34   1  0 [kworker/0:1-events]                
      35883 root      20                                  0:01.28   1 12 [kworker/12:0-events]              
      14163 root      20                                  0:01.21   1  8 [kworker/8:0-events]                
      45805 root      20                                  0:01.12   1  9 [kworker/9:2-events]                
      31295 root      20                                  0:01.09   1 14 [kworker/14:1-events]              
       2379 root      20                                  0:01.06   1  1 [nv_queue]                          
      26279 root      20                                  0:00.98   1  5 [kworker/5:1-events]                
      40491 root      20                                  0:00.95   1 10 [kworker/10:0-events]              
       1945 root      20      386.2m  16.0m         0.0   0:00.82   6  6 /usr/libexec/udisks2/udisksd        
      24286 root      20                                  0:00.79   1  6 [kworker/6:0-events]                
          1 root      20      167.5m  15.8m         0.0   0:00.69   1  5 /sbin/init

    Leave a comment:


  • avem
    replied
    Originally posted by yump View Post

    Doesn't do anything about kernel threads, though. If I had one of these chips, I'd rather:
    Code:
    for i in {16..23}; do echo 0 | sudo tee /sys/devices/system/cpu/cpu${i}/online; done
    At least until Intel gets their butts in gear.
    Normally the kernel does the least amount of work, so that shouldn't be a big deal.

    Oh, another pain of Linux: in Windows there's just a single process called "System" which is easy to assess in terms of CPU time it uses while in Linux you've got a ton of kernel threads and estimating their load is quite difficult if not impossible for the naked eye.

    Leave a comment:


  • atomsymbol
    replied
    Originally posted by yump View Post

    Doesn't do anything about kernel threads, though. If I had one of these chips, I'd rather:
    Code:
    for i in {16..23}; do echo 0 | sudo tee /sys/devices/system/cpu/cpu${i}/online; done
    At least until Intel gets their butts in gear.
    I was trying to hint that a lot of sites are (explicitly or implicitly) claiming that "there are no existing solutions to AlderLake scheduling in Linux". The fact is, there are.

    Second example (better than /usr/bin/taskset):

    Code:
    A=/sys/fs/cgroup/cpuset/foo
    mkdir $A
    chown USER:USER -R $A
    echo 3-5 > $A/cpuset.cpus
    echo 0 > $A/cpuset.mems
    
    cgexec -g cpuset:foo stress-ng --cpu $(nproc)
    
    cgexec -g cpuset:foo <steam_game>
    redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpuset

    A problem that I encountered while using cgroups is that there appears to be no way to limit the number of IOPS for buffered writes to a harddisk.
    Last edited by atomsymbol; 15 November 2021, 09:09 PM. Reason: Add chown -R

    Leave a comment:


  • yump
    replied
    Originally posted by atomsymbol View Post

    Code:
    taskset --cpu-list 0-15 command arguments...
    Doesn't do anything about kernel threads, though. If I had one of these chips, I'd rather:
    Code:
    for i in {16..23}; do echo 0 | sudo tee /sys/devices/system/cpu/cpu${i}/online; done
    At least until Intel gets their butts in gear.

    Leave a comment:


  • atomsymbol
    replied
    Originally posted by phoronix View Post
    Phoronix: Windows 11 Better Than Linux Right Now For Intel Alder Lake Performance

    While we are used to running AMD and Intel benchmarks between Microsoft Windows and Linux while most often finding that our favorite open-source operating systems normally lead the race from desktops through HEDT and server platforms, when it comes to the Core i9 12900K "Alder Lake" that is currently not the case. Going into this round of Windows vs. Linux testing quite curious given some Intel hybrid architecture oddities we have been seeing under Linux, indeed when hitting Windows 11 and an assortment of Linux distributions with benchmarks we were left disappointed. Not only did Windows 11 come out faster overall, but related is now Linux also had much higher run-to-run variance due to the mix of P and E cores with Thread Director not making the wisest choices under Linux.

    https://www.phoronix.com/vr.php?view=30684
    Code:
    taskset --cpu-list 0-15 command arguments...

    Leave a comment:


  • yump
    replied
    Originally posted by agd5f View Post

    I'm not sure what you are asking. Linux and windows take different approaches to CPU clock management in my understanding. Both OSes still give hints to the the hardware. CPPC (the underlying interface to control the CPU clocks) was designed for windows and seems to work well there. Linux seems to be more hands on while windows less so. I suspect windows gives target hints at a pretty coarse level and then lets the hardware go (here's my target performance, with that in mind, get it done as fast as possible) while Linux seems to constantly be setting new targets for performance (more work coming online, lets try a slightly faster target, now there's less work, let's try and slow things down). With the old pstate APCI interface, there were only 3 states, so even if the OS was constantly setting new targets, you'd just end up snapping to the nearest state. CPPC gives you a continuum of performance states, so every time you set a new hint, you could potentially end up walking through a long continuum of frequencies. I'm certainly not an expert in this area, just seems that way from what I've seen.

    As stated by others on the thread, the desktop is not necessarily the main use case for Linux at this point. I suspect the current schedulers work well for embedded and server platforms. In those cases you might favor something more deterministic which it seems like the Linux schedulers strive for.
    Cpufreq conservative works like that, but I think its hysteresis band is pretty wide by default. If you move the thresholds close together, you can make it act as an integral-only controller, which actually works decently well in some workloads. The schedutil and ondemand governors, on the other hand, are proportional with some heuristics bolted on. "We have X% load on this core, so set frequency such that X would be 80% assuming constant IPC."

    The advantage of schedutil, in theory, is that it can preemptively change the frequency when a thread migrates between cores, without waiting for the hardware to figure it out.

    Intel pstate in HWP mode (only available on Skylake and later, I think) lets hardware decide.

    In general, my belief is that hardware can do a better job when you're trying to maximize performance constrained by power/temperature/current/voltage drop, but that software is better at minimizing the total task energy. It takes software to know whether the executing thread is a real-time game that wants 5 ms latency, a video decoder that has plenty of buffer and can get away with 200 ms, or a backup job that gets there when it gets there. Uclamp can do that, but it needs per-application tuning and unfortunately only Android seems to have the resources for to do that.

    Leave a comment:


  • agd5f
    replied
    Originally posted by Linuxxx View Post

    Honestly, comments like this make me wonder why there apparently isn't a proper communication channel inside AMD between different divisions?

    You say that the decision-making logic should be left up to the hardware to decide, while schedutil proponents argue that the hardware can't possibly have a clue about OS run-time queues of all the different threads interacting with each other.
    I'm not sure what you are asking. Linux and windows take different approaches to CPU clock management in my understanding. Both OSes still give hints to the the hardware. CPPC (the underlying interface to control the CPU clocks) was designed for windows and seems to work well there. Linux seems to be more hands on while windows less so. I suspect windows gives target hints at a pretty coarse level and then lets the hardware go (here's my target performance, with that in mind, get it done as fast as possible) while Linux seems to constantly be setting new targets for performance (more work coming online, lets try a slightly faster target, now there's less work, let's try and slow things down). With the old pstate APCI interface, there were only 3 states, so even if the OS was constantly setting new targets, you'd just end up snapping to the nearest state. CPPC gives you a continuum of performance states, so every time you set a new hint, you could potentially end up walking through a long continuum of frequencies. I'm certainly not an expert in this area, just seems that way from what I've seen.

    As stated by others on the thread, the desktop is not necessarily the main use case for Linux at this point. I suspect the current schedulers work well for embedded and server platforms. In those cases you might favor something more deterministic which it seems like the Linux schedulers strive for.
    Last edited by agd5f; 15 November 2021, 12:17 PM.

    Leave a comment:


  • torsionbar28
    replied
    Originally posted by HEL88 View Post
    Intel has shown that the linux desktop is completely irrelevant.
    Nailed it. Linux is optimized for commercially viable platforms. I.e. server and embedded devices. We will not see Alder Lake optimization until early 2022 at the soonest when Sapphire Rapids Xeon launches. Desktop users only get to reap the benefit if/when the desktop chip shares uarch with the server chip. If you're that committed to Linux on desktop use Ryzen instead (shares uarch with EPYC) or use the Xeon E series. Personally I run a Xeon E-2276G running Fedora for serious work, and a Ryzen 3600 for Linux gaming. Alder Lake is stuck in an odd spot at the moment.

    Leave a comment:

Working...
X