Announcement

Collapse
No announcement yet.

AMD Enabling "Fast CPPC" For Even Greater Linux Performance & Power Efficiency On Some CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Enabling "Fast CPPC" For Even Greater Linux Performance & Power Efficiency On Some CPUs

    Phoronix: AMD Enabling "Fast CPPC" For Even Greater Linux Performance & Power Efficiency On Some CPUs

    While AMD Zen 4 processors whether it be the Ryzen 7000/8000 desktop/mobile series or EPYC 8004/9004 series server processors are already performing very well on Linux and with great power efficiency against the competition as shown in dozens of Phoronix articles at this point, it turns out there's been a minor power/performance optimization left untapped yet under Linux for select Zen 4 processors. A new patch series posted this Sunday allows for this "fast CPPC" feature to be utilized on supported processors...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Enabling CPPC + PState-Passive was enough to let 5950X clock a lot higher (without any OC/PBO):
    * none/acpi - 4.8 GHz
    * amd_pstate=active - 4.8 GHz
    * amd_pstate=passive - 5.1GHz
    * amd_pstate=guided - 5.1GHz

    I wonder why CPPC is missing in ASUS Prime X670-P BIOS though.. That wasn’t the case with ASUS ROG X570-E. Tired of ASUS' nonesense. My next motherboard will be ASROCK for sure
    Last edited by Kjell; 28 April 2024, 12:19 PM.

    Comment


    • #3
      Originally posted by Kjell View Post
      Enabling CPPC + PState-Passive was enough to let the CPU clock a lot higher.

      I wonder why CPPC is missing in ASUS Prime X670-P BIOS though.. That wasn’t the case with ASUS ROG X570-E.

      Tired of ASUS' nonesense. My next motherboard will be ASROCK for sure
      Did you update your BIOS and/or check in BIOS for a CPPC option? I haven't seen any Ryzen 7000 series motherboard without CPPC support...
      Michael Larabel
      https://www.michaellarabel.com/

      Comment


      • #4
        Originally posted by Kjell View Post
        Enabling CPPC + PState-Passive was enough to let the CPU clock a lot higher (without any OC/PBO):

        I wonder why CPPC is missing in ASUS Prime X670-P BIOS though.. That wasn’t the case with ASUS ROG X570-E. Tired of ASUS' nonesense. My next motherboard will be ASROCK for sure
        You can check if cppc support is present with "lscpu | grep cppc" on Zen4.
        It should be exposed.

        These additonal options in the BIOS, like ASROCK has (Prefer Cache, Clock, ...) might not present on ASUS.

        Besides that, very interesting addition to the kernel. Im wondering why this came quite late, maybe preparing for the Zen5 Release.
        My 7950X3D seems not to have support for it, checking lscpu.

        Comment


        • #5
          Well, ive just read that it only works on the amd-pstate=passive driver anyways. Can be found on the second patchset:
          > This change will only be effective in the *passive mode* of AMD pstate driver.​

          Comment


          • #6
            Originally posted by Michael View Post
            Did you update your BIOS and/or check in BIOS for a CPPC option?
            Just triple checked and CPPC option is still missing in BIOS after updating from '2413 (ComboAM5PI_1102b 2024/02/07)​' -> '2613 (ComboAM5PI_1170a 2024/04/17)' with 7950X

            Even the Search function can't find it and there's nothing about CPPC in neither SMU nor NBIO sections under AMD CBS

            Originally posted by ptr1337 View Post
            You can check if cppc support is present
            The flag is detected despite it missing in BIOS!
            However, I'm not convinced it works.
            Thread utilization is completely random while compiling with -j8 (bear with me -->)
            no_cppc.png















            I may be wrong but there's posts suggesting that CPPC is a key part of cluster aware/topology scheduling logic.
            For instance, some users are complaining about random cores being utilized instead of the fastest without CPPC (e.g: PBO2 frequency override might break CPPC thus all multi-CCX and vcache scheduling).

            --> This seems to be the case in my system, core utilization is all over the place instead of being centralized to the same CCX even if I'm compiling with 8 cores

            > lscpu | grep cppc
            Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d​
            Last edited by Kjell; 28 April 2024, 04:45 PM.

            Comment


            • #7
              Same. I see it listed in my 7950X3D:
              ❯ lscpu | grep cppc
              Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d​

              Comment


              • #8
                Originally posted by dc740 View Post
                Same. I see it listed in my 7950X3D
                if I run $ stress --cpu 1 I see the process joggle between random cores instead of being pinned which is why I suspect CPPC is bust with my ASUS X670-P motherboard.​

                Same can be observed while compiling with 8 cores (-j8), it looks like there isn't any structure as to which cores are being utilized. Ideally I'd like to see applications share L3 cache inside the same core complex (CCX0: 0-7,16-23 /or/ CCX1: 8-15,24-31) for lowest latency (which should be beneficial to gaming).

                lstopo.png















                https://i.imgur.com/V2t0oqj.png

                Comment


                • #9
                  Originally posted by Kjell View Post

                  if I run $ stress --cpu 1 I see the process joggle between random cores instead of being pinned which is why I suspect CPPC is bust with my ASUS X670-P motherboard.​

                  Same can be observed while compiling with 8 cores (-j8), it looks like there isn't any structure as to which cores are being utilized. Ideally I'd like to see applications share L3 cache inside the same core complex (CCX0: 0-7,16-23 /or/ CCX1: 8-15,24-31) for lowest latency (which should be beneficial to gaming).

                  lstopo.png















                  https://i.imgur.com/V2t0oqj.png
                  It sounds like you are trying to test "Preferred Cores" which is the specific feature that targets the fastest-clocking core for single-threaded processes. That was not merged until linux 6.9. CPPC encompasses a bunch of stuff besides Preferred Cores.

                  Comment


                  • #10
                    Originally posted by parahaps View Post
                    It sounds like you are trying to test "Preferred Cores"
                    That's definitely the case with the single threaded stress test

                    However, when testing multiple cores I'd expect to scale from CCX0 to CCX1 if we take into account that there's a latency penalty with infinity fabric as demonstrated here:


                    I though avoiding latency penalties was tackled with cluster aware scheduling:
                    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite


                    In reality it's probably not that simple considering not each core is equal & spreading the workload would utilize CPU's entire cache more efficiently
                    Last edited by Kjell; 29 April 2024, 06:40 AM.

                    Comment

                    Working...
                    X