Announcement

Collapse
No announcement yet.

Linux 6.6 WQ Change May Help Out AMD CPUs & Other Systems With Multiple L3 Caches

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linux 6.6 WQ Change May Help Out AMD CPUs & Other Systems With Multiple L3 Caches

    Phoronix: Linux 6.6 WQ Change May Help Out AMD CPUs & Other Systems With Multiple L3 Caches

    In addition to the EEVDF scheduler replacing the CFS code in Linux 6.6, another fundamental and interesting change with Linux 6.6 is on the workqueue (WQ) side with a rework that can benefit systems with multiple L3 caches like modern AMD chiplet-based systems...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    A little extra context, AMD "chiplet" CPUs include Ryzen 2 and later designs. Ryzen 1xxx and 2xxx doesn't use the same physical design. So anyone with a 3xxx, 4xxx, 5xxx, etc will probably see better cache performance along with any other architectures using chiplet-like designs.

    Comment


    • #3
      Originally posted by stormcrow View Post
      A little extra context, AMD "chiplet" CPUs include Ryzen 2 and later designs. Ryzen 1xxx and 2xxx doesn't use the same physical design. So anyone with a 3xxx, 4xxx, 5xxx, etc will probably see better cache performance along with any other architectures using chiplet-like designs.
      Especially the 7000 series x3d models where there's a 3x cache size difference between CCDs.

      Except the 7800x3d because it's a single CCD model. The only downside is that it has a much lower boost clock than the rest...**puts on tin foil hat** to make the multi-chiplet x3d models appear better in benchmarks due to unforseen cache thrashing because, apparently, big.LITTLE with the cache isn't the best idea and your high end products can't lose to their lower tier models. **takes off tin foil hat**...and still performs pretty competitively with the rest.

      Comment


      • #4
        Originally posted by skeevy420 View Post
        Except the 7800x3d because it's a single CCD model. The only downside is that it has a much lower boost clock than the rest...**puts on tin foil hat** to make the multi-chiplet x3d models appear better in benchmarks due to unforseen cache thrashing because, apparently, big.LITTLE with the cache isn't the best idea and your high end products can't lose to their lower tier models. **takes off tin foil hat**...and still performs pretty competitively with the rest.
        Wasn't the issue that the extra cache lowers heat dissipation (since it's stacked on top of the core) and so they needed to control frequency? IIRC the cores on the CCD with the 3D V-cache on the 2 CCD parts boosts lower there as well, while the other CCD boosts normally. Those last few MHz take a ton of voltage to reach in a stable manner.

        Comment


        • #5
          Originally posted by petronio View Post

          Wasn't the issue that the extra cache lowers heat dissipation (since it's stacked on top of the core) and so they needed to control frequency? IIRC the cores on the CCD with the 3D V-cache on the 2 CCD parts boosts lower there as well, while the other CCD boosts normally. Those last few MHz take a ton of voltage to reach in a stable manner.
          Apparently so. The 3d cache side boosts lower and the regular side boosts faster. It's still fun to joke about, though.

          Ton of voltage aside, the 7800x3d could probably be clocked a tiny bit higher. Some people have tweaked the ECLK to get it up to 5.4Ghz on stock voltage (it can't be raised) and the stock frequency can be Curve Optimized to -30, -40, or maybe even more.

          Comment


          • #6
            Originally posted by stormcrow View Post
            A little extra context, AMD "chiplet" CPUs include Ryzen 2 and later designs. Ryzen 1xxx and 2xxx doesn't use the same physical design. So anyone with a 3xxx, 4xxx, 5xxx, etc will probably see better cache performance along with any other architectures using chiplet-like designs.
            Zen and Zen+ also employ split L3 cache, even on single die processors. There are up to two CCXs on a CCD, each with up to 8MB L3 cache. This is exposed to the operating system, for example Ryzen 2700X:
            Code:
            # lscpu -e
            CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ    MINMHZ
              0    0      0    0 0:0:0:0          yes 3700.0000 2200.0000
              1    0      0    1 1:1:1:0          yes 3700.0000 2200.0000
              2    0      0    2 2:2:2:0          yes 3700.0000 2200.0000
              3    0      0    3 3:3:3:0          yes 3700.0000 2200.0000
              4    0      0    4 4:4:4:1          yes 3700.0000 2200.0000
              5    0      0    5 5:5:5:1          yes 3700.0000 2200.0000
              6    0      0    6 6:6:6:1          yes 3700.0000 2200.0000
              7    0      0    7 7:7:7:1          yes 3700.0000 2200.0000
              8    0      0    0 0:0:0:0          yes 3700.0000 2200.0000
              9    0      0    1 1:1:1:0          yes 3700.0000 2200.0000
             10    0      0    2 2:2:2:0          yes 3700.0000 2200.0000
             11    0      0    3 3:3:3:0          yes 3700.0000 2200.0000
             12    0      0    4 4:4:4:1          yes 3700.0000 2200.0000
             13    0      0    5 5:5:5:1          yes 3700.0000 2200.0000
             14    0      0    6 6:6:6:1          yes 3700.0000 2200.0000
             15    0      0    7 7:7:7:1          yes 3700.0000 2200.0000​
            So in theory this change would affect every Zen CPU.

            Comment


            • #7
              numacross
              Why is your MINMHZ so high? Is that intentional?​

              Code:
              ❯ lscpu -e
              CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ   MINMHZ       MHZ
                0    0      0    0 0:0:0:0          yes 5050.0000 400.0000 5050.0000
                1    0      0    1 1:1:1:0          yes 5050.0000 400.0000 4840.6221
                2    0      0    2 2:2:2:0          yes 5050.0000 400.0000 4599.1758
                3    0      0    3 3:3:3:0          yes 5050.0000 400.0000 5050.0000
                4    0      0    4 4:4:4:0          yes 5050.0000 400.0000 4840.7319
                5    0      0    5 5:5:5:0          yes 5050.0000 400.0000 4105.0889
                6    0      0    6 6:6:6:0          yes 5050.0000 400.0000 5050.0000
                7    0      0    7 7:7:7:0          yes 5050.0000 400.0000 5050.0000
                8    0      0    0 0:0:0:0          yes 5050.0000 400.0000 5050.0000
                9    0      0    1 1:1:1:0          yes 5050.0000 400.0000 5050.0000
               10    0      0    2 2:2:2:0          yes 5050.0000 400.0000 4740.3989
               11    0      0    3 3:3:3:0          yes 5050.0000 400.0000 5050.0000
               12    0      0    4 4:4:4:0          yes 5050.0000 400.0000 5050.0000
               13    0      0    5 5:5:5:0          yes 5050.0000 400.0000 5050.0000
               14    0      0    6 6:6:6:0          yes 5050.0000 400.0000 5050.0000
               15    0      0    7 7:7:7:0          yes 5050.0000 400.0000 5050.0000​ ​





              Comment


              • #8
                Originally posted by skeevy420 View Post
                numacross
                Why is your MINMHZ so high? Is that intentional?​
                It's an older kernel so it's falling back to acpi-cpufreq:
                Code:
                # cpupower frequency-info
                analyzing CPU 0:
                  driver: acpi-cpufreq
                  CPUs which run at the same hardware frequency: 0
                  CPUs which need to have their frequency coordinated by software: 0
                  maximum transition latency:  Cannot determine or is not supported.
                  hardware limits: 2.20 GHz - 3.70 GHz
                  available frequency steps:  3.70 GHz, 3.20 GHz, 2.20 GHz
                  available cpufreq governors: conservative ondemand userspace powersave performance schedutil
                  current policy: frequency should be within 2.20 GHz and 3.70 GHz.
                                  The governor "ondemand" may decide which speed to use
                                  within this range.
                  current CPU frequency: 3.70 GHz (asserted by call to hardware)
                  boost state support:
                    Supported: yes
                    Active: yes
                    Boost States: 0
                    Total States: 3
                    Pstate-P0:  3700MHz
                    Pstate-P1:  3200MHz
                    Pstate-P2:  2200MHz​

                Comment

                Working...
                X