Announcement

Collapse
No announcement yet.

Linux 6.6 Lands "Pretty Juicy" IOmap Improvements, Lower Latency With IO_uring

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linux 6.6 Lands "Pretty Juicy" IOmap Improvements, Lower Latency With IO_uring

    Phoronix: Linux 6.6 Lands "Pretty Juicy" IOmap Improvements, Lower Latency With IO_uring

    Among the exciting early pull requests to land in the new Linux 6.6 kernel cycle are some nice improvements to the IOmap code that should yield some substantive I/O benefits with this new kernel...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Nice, but what about AMD preferred core patch? This is single most important feature for AMD cpus, even more important than the new scheduler.

    Linux is not aware that the second CCD is far inferior and either uses it for single threaded tasks or runs first CCD at second CCD performance levels (if any CPU from second CCD is being used, as they have to share same voltage under load)

    Comment


    • #3
      Originally posted by sobrus View Post
      Nice, but what about AMD preferred core patch? This is single most important feature for AMD cpus, even more important than the new scheduler.

      Linux is not aware that the second CCD is far inferior and either uses it for single threaded tasks or runs first CCD at second CCD performance levels (if any CPU from second CCD is being used, as they have to share same voltage under load)
      Can you explain more about this.
      How is the second CCD 'far superior', on which chips, why, what's the mitigation?

      (on my CPU the difference is about 5% between CCDs)

      Comment


      • #4
        I think the preferred core thing didn't make it for 6.6.
        ​​​

        Comment


        • #5
          Originally posted by pkese View Post

          Can you explain more about this.
          How is the second CCD 'far superior', on which chips, why, what's the mitigation?

          (on my CPU the difference is about 5% between CCDs)
          i guess he talks about the new x3d cache cpu's where only one ccd has an 3D-V-Cache
          it wouldn't call one of the ccds superior though. it just depends on the workload which ccd will perform better

          Comment


          • #6
            Originally posted by sobrus View Post
            Nice, but what about AMD preferred core patch? This is single most important feature for AMD cpus, even more important than the new scheduler.

            Linux is not aware that the second CCD is far inferior and either uses it for single threaded tasks or runs first CCD at second CCD performance levels (if any CPU from second CCD is being used, as they have to share same voltage under load)
            Is there a issue report which tracks the progress of this?

            Comment


            • #7
              Originally posted by flower View Post

              i guess he talks about the new x3d cache cpu's where only one ccd has an 3D-V-Cache
              it wouldn't call one of the ccds superior though. it just depends on the workload which ccd will perform better
              No, in my 5950x (and I guess this is true for all dual CCD Zen2 and later chips), the first CCD is superior, made from higher-binned silicon.
              It can reach higher frequencies using less power. For example my CCD1 can usually hit 4,7Ghz using 6-7 watts, whereas CCD2 cores need 9-10 watts to reach same frequency.

              For example, I'm running "stress -c 2" with manually selected CPU affinity (CPU is locked to 4.8Ghz max):
              Using core 1 and 7 (only CCD1 used, best cores) - i get 4800Mhz @ 8W each. Cool and quiet. Would probably go above 5Ghz if unlocked.
              Using core 1 and 14 (both CCDs used) - I get just 4690Mhz at 12W and 10W respectively.... so it is using almost 40% more power with less speed and temperatures soaring high.
              Moreover suddenly the best core starts to use 50% more power, because it has to share voltage and frequency (or even something else) with the inferior core.

              For such load, Linux will usually randomly select one core from each CCD - giving us the second scenario.
              I'm surprised that this is not yet implemented 4 years after Zen2 was released but I bet windows is smarter.

              Originally posted by MastaG View Post
              I think the preferred core thing didn't make it for 6.6.
              ​​​
              I'm afraid this is the case
              Last edited by sobrus; 29 August 2023, 11:49 AM.

              Comment


              • #8
                Regarding the preferred core patch:

                With the current version applied, all cores report a value of 166 (AMD_PSTATE_PREFCORE_THRESHOLD) on my Ryzen 7950x (/sys/devices/system/cpu/cpufreq/*/amd_pstate_highest_perf). There seems to be a small logic error with this patch as detailed here: .

                With the enablement logic reversed, most cores report a unique value, and it seems to work, though I have no way of actually verifying it I suppose.

                Comment


                • #9
                  Since kernel 6.4, there has been leaps bounds in terms of I/O performance! Awesome , 6.6 will be excellent!

                  Comment


                  • #10
                    Originally posted by SearingHeat View Post
                    Regarding the preferred core patch:

                    With the current version applied, all cores report a value of 166 (AMD_PSTATE_PREFCORE_THRESHOLD) on my Ryzen 7950x (/sys/devices/system/cpu/cpufreq/*/amd_pstate_highest_perf). There seems to be a small logic error with this patch as detailed here: .

                    With the enablement logic reversed, most cores report a unique value, and it seems to work, though I have no way of actually verifying it I suppose.
                    Seems like they will need some more time to iron out the bugs. Hopefully it will take less time than delivering pstate driver....

                    Comment

                    Working...
                    X