Announcement

Collapse
No announcement yet.

AMD + Valve Working On New Linux CPU Performance Scaling Design

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • intelfx
    replied
    Originally posted by kiffmet View Post
    intelfx except that on AMD CPUs CPPC2 reported "most performant cores" do not even correspond to the cores that can hit the highest peak&sustained frequencies.
    Can you back this up?

    I owned two AMD CPUs of two different generations and it was true on both.

    PDS without CPPC2 on Linux works consistently better than CPPC2 with the Windows scheduler and it can migrate background tasks to other CCXs aswell in case a task demands to utilize it fully.
    It would be interesting to know how you managed to compare two schedulers in two different operating systems.
    Last edited by intelfx; 03 September 2021, 02:09 AM.

    Leave a comment:


  • kiffmet
    replied
    intelfx except that on AMD CPUs CPPC2 reported "most performant cores" do not even correspond to the cores that can hit the highest peak&sustained frequencies. PDS without CPPC2 on Linux works consistently better than CPPC2 with the Windows scheduler and it can migrate background tasks to other CCXs aswell in case a task demands to utilize it fully.

    Leave a comment:


  • intelfx
    replied
    Originally posted by kiffmet View Post
    Schedutil + ProjectC/PDS w/ Kernel 5.13 works pretty damn well on my 3900X. PDS does better what a part of what AMD's "suggested cores" CPPC feature tries to accomplish: Have threads from the same task share the same CCX/"L3 cache slice" if possible. This already resulted in a big performence improvement, esp. in gaming.
    That's a totally different thing from what CPPC2/"preferred cores" is. It's about putting the most demanding task(s) onto the most performant core(s), and putting all the background tasks onto the least performant CCX (doesn't matter where, they're background).

    Leave a comment:


  • kiffmet
    replied
    Schedutil + ProjectC/PDS w/ Kernel 5.13 works pretty damn well on my 3900X. PDS does better what a part of what AMD's "suggested cores" CPPC feature tries to accomplish: Have threads from the same task share the same CCX/"L3 cache slice" if possible. This already resulted in a big performence improvement, esp. in gaming.
    Last edited by kiffmet; 03 August 2021, 07:26 AM.

    Leave a comment:


  • Linuxxx
    replied
    Originally posted by aufkrawall View Post
    avg fps and even 1% low percentile don't show bad influence on frame time variance well enough. You will see MangoHUD's frame time graph to look bad with schedutil/powersave when under certain load conditions, causing stutter, missed vblanks etc. While schedutil is way better than intel_pstate powersave (why is this crap the default setting...), it is still not good enough.
    While the performance CPU governor will undoubtedly always come out on top of schedutil under ideal conditions, with the Steam Deck being a handheld device this obviously can't apply here with both power as well as heat constraints.

    Nevertheless, I think it would be interesting to see if simply disabling the boost clocks on the CPU and then comparing performance vs. schedutil would reveal any interesting insights...

    However, at the end of the day, simply sticking with schedutil on any type of APU or SoC setup will probably be the ideal route going forward, as Android is already showing since a few years now.

    Leave a comment:


  • illwieckz
    replied
    Originally posted by perpetually high View Post
    […] we won't see the real performance gains we should be seeing.
    Also one thing is that not everything is about performance. On my laptop with an Intel CPU I have to disable Intel pstate on kernel boot command line to get proper CPU scaling, otherwise the CPU is running at full performance like a rocket engine at takeof but running this way forever like if it had infinite fuel and there was no stage separation, trying to deliver from the hardware the very highest performance possible like if I was doing a drag race, and the laptop becomes super hot with the CPU never slowing down.

    And that happens even on battery.

    Sometime I just want to not have a brick of hot lava on my desk for hours, sometime I just want do not heat the room in summer, sometime I just want to have more than 15 minutes of autonomy on on battery, and sometime I just want my laptop to not die prematurely. Performance benchmarks do not draw such pictures.

    Leave a comment:


  • intelfx
    replied
    Originally posted by perpetually high View Post

    You say that like it's a woke statement. More of a captain obvious

    Hey, linux is free, and anyone can modify it. You want to gain traction and movement, yeah, you're going have to put money, people, and resources behind it. Who has those? Commercial interest and corporations. So what exactly is your point?
    The un-obvious part is that the traction and movement attainable by a single enthusiast (or a small group thereof) is basically not enough for anything in Linux anymore.
    Last edited by intelfx; 03 August 2021, 03:29 AM.

    Leave a comment:


  • chris200x9
    replied
    Since this is CPU scaling work couldn't it have spillover effects such as making epyc a little faster/power efficient? If so this could be huge not just for gaming but for everything!

    Leave a comment:


  • unic0rn
    replied
    under manjaro running on A8-7600 i was using something like this:

    Code:
    /bin/cpupower frequency-set -g conservative
    /bin/echo 2 >/sys/devices/system/cpu/cpufreq/conservative/sampling_down_factor
    /bin/echo 75 >/sys/devices/system/cpu/cpufreq/conservative/up_threshold
    /bin/echo 10 >/sys/devices/system/cpu/cpufreq/conservative/down_threshold
    /bin/echo 10 >/sys/devices/system/cpu/cpufreq/conservative/freq_step
    /bin/echo 1 >/sys/devices/system/cpu/cpufreq/conservative/ignore_nice_load
    imho the trick is to keep the clocks high enough but not too high. i didn't want the cpu to go full turbo when there's no need, and huge jumps in cpu clock back and forth are also kinda pointless. conservative scheduler is perfect for that since it can be configured as above to ramp up the clocks fast enough when there's any load, but at the same time it won't go too high and won't be downclocking like crazy unless the load really goes down. the difference is perfectly visible while watching movies for example, when decoding them on the cpu, but it works just as well with games.

    Leave a comment:


  • RedEyed
    replied
    Firefox went to updating the screen with less than 1 fps (disabling GPU acceleration helps though) and the rest of the desktop also feels sluggish.
    Never had Firefox with less then 20 FPS (mb it depends on content). I have a lot of machines with different nvidia GPUs

    The memory allocator also has trouble allocating larger chunks of VRAM when there still should be enough memory left
    There is a well known problem that called "memory fragmentation".
    Your experience is really interesting, but I don't believe that AMD has better memory allocator than Nvidia.
    I think that you just used cudnn_benchmark=True which drastically increases memory usage in order to find the best algorithm, try to disable it and more likelly you can use batch size of 8 as on your AMD.

    As a result, there are some networks that I can only train with a batch size of 3 when I trained them with a batch size of 8 on my previous Polaris GPU (both cards have 8 GB VRAM).
    Using odd batch size drastically decreases performance, so my advice - never use odd batch size and use even sizes.

    Leave a comment:

Working...
X