Announcement

Collapse
No announcement yet.

AMD Ryzen 9 7900X3D Linux Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Ryzen 9 7900X3D Linux Performance

    Phoronix: AMD Ryzen 9 7900X3D Linux Performance

    Following last week's review of the brand new AMD Ryzen 9 7950X3D and then moving on to looking at the Ryzen 9 7900X3D gaming performance, today's Linux hardware coverage on Phoronix is looking at the Ryzen 9 7900X3D Linux performance in other system/CPU workloads aside from gaming.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Are there any news regarding a zen4 related x3d scheduler yet? I really like the power efficiency and the concept behind with specialized cores. But I fear to buy a product not and never really supported by Linux.

    Comment


    • #3
      Originally posted by HerrLange View Post
      Are there any news regarding a zen4 related x3d scheduler yet? I really like the power efficiency and the concept behind with specialized cores. But I fear to buy a product not and never really supported by Linux.
      Haven't heard anything... Obviously once I do (assuming no NDA/embargo, which usually isn't the case for Linux specific items) I'll certainly write about it.
      Michael Larabel
      https://www.michaellarabel.com/

      Comment


      • #4
        IMHO it would be interesting to see how the performance (and esp. performance-per-watt) of the non-X3D versions compares to the X3D versions when running in Eco-Mode (with a comparable power-target as the X3D versions).
        Then we'd see how much of the performance per watt difference is due to more cache and how much is due to lower TDP (and if the X3D versions are worth the money if you want to save energy) - AFAIK the non-X3D versions are still pretty fast with lower TDPs.

        Comment


        • #5
          Originally posted by HerrLange View Post
          Are there any news regarding a zen4 related x3d scheduler yet? I really like the power efficiency and the concept behind with specialized cores. But I fear to buy a product not and never really supported by Linux.
          It's not an easy problem to solve regardless of the OS. If it performs well today in your use cases, I wouldn't worry about how much further it could be optimized. The scheduler doesn't really know what apps will benefit from more cache vs. more speed and there is not really a magic way to tell. Windows does not have native support for these sorts of asymmetries in their scheduler either. There are a number of ideas floating around (perf counters to look at historic trends in the app, adding hints to the binary in the compiler, etc.). I expect this will be a big area of research in the near future.

          Comment


          • #6
            Originally posted by agd5f View Post
            It's not an easy problem to solve regardless of the OS.
            I disagree. I think it's a lot easier to address than Intel's situation with P-cores and E-cores. For 3D cache, the main consideration should be the sensitivity of a thread to L3 cache. And, to that end, AMD exposed some new performance counters that the thread scheduler could use to make such decisions.

            Better yet, you could potentially even use eBPF to implement such scheduler tweaks, without even having to patch & recompile the kernel! The main thing that's probably missing is the ability for eBPF to read those performance counters.


            Originally posted by agd5f View Post
            The scheduler doesn't really know what apps will benefit from more cache vs. more speed and there is not really a magic way to tell.
            You can't directly see, but you could compare its L3 hit-rate when running on a 3D cache die vs. a non-3D cache die. Threads with low L3 utilization, in either case, can easily be assigned to the non-3D cache die. Threads with higher L3 utilization can be prioritized for the 3D cache die, based on the relative improvement in hit rate.

            Originally posted by agd5f View Post
            I expect this will be a big area of research in the near future.
            That's just off the top of my head. Anyone with a background in thread-scheduling would probably have better ideas about how to tackle it.

            Originally posted by agd5f View Post
            Windows does not have native support for these sorts of asymmetries in their scheduler either.
            Maybe not 3D cache - I haven't read one way or another. But Windows 11 certainly has support for Intel's Thread Director.

            Comment


            • #7
              Originally posted by DanielG View Post
              IMHO it would be interesting to see how the performance (and esp. performance-per-watt) of the non-X3D versions compares to the X3D versions when running in Eco-Mode (with a comparable power-target as the X3D versions).
              Indeed, a comparison with 7900 and 7700 (non-X) would be interesting. I suspect the power efficiency is mainly because of the lower clocks, the extra cache should be an additional power draw. And if this is the case, then 7900 should offer even better power efficiency without sacrificing much of the performance.

              Comment


              • #8
                Originally posted by coder View Post
                I disagree. I think it's a lot easier to address than Intel's situation with P-cores and E-cores. For 3D cache, the main consideration should be the sensitivity of a thread to L3 cache. And, to that end, AMD exposed some new performance counters that the thread scheduler could use to make such decisions.
                Better yet, you could potentially even use eBPF to implement such scheduler tweaks, without even having to patch & recompile the kernel! The main thing that's probably missing is the ability for eBPF to read those performance counters.
                You can't directly see, but you could compare its L3 hit-rate when running on a 3D cache die vs. a non-3D cache die. Threads with low L3 utilization, in either case, can easily be assigned to the non-3D cache die. Threads with higher L3 utilization can be prioritized for the 3D cache die, based on the relative improvement in hit rate.
                Sure, I suggested perf counters as a possible option as well. That said, I'm not sure how much of an issue using perf counters in a hot path like this would be. Moreover, these sorts of things don't always work out as well as you would hope in practice. Look at CPU frequency scaling; full software control of the frequency governor tends not to do a particularly good job in a lot of cases. Hardware assisted mode like EPP or guided CPPC tend to do better, at least for general use cases.

                Comment


                • #9
                  Originally posted by otoomet View Post
                  the extra cache should be an additional power draw.
                  That's not what we've seen in mobile contexts, for instance. Apple likes big caches because cache lookups are more energy efficient than DRAM fetches, even enough to offset the cost of powering bigger caches.

                  Also, we really ought to be looking at energy efficiency in terms of the total Joules needed to perform a defined workload. That's the gold standard for measuring energy efficiency, unless you're specifically concerned about gaming, video playback, or some other realtime task.

                  Comment


                  • #10
                    Originally posted by agd5f View Post
                    I'm not sure how much of an issue using perf counters in a hot path like this would be.
                    I can't imagine performance counters would take more than a couple nanoseconds to read, whereas the duration of a timeslice is usually multiple milliseconds. So, you're only off by a mere 6 orders of magnitude or so. And you wouldn't even have to sample them every timeslice. Nice try, though.

                    Originally posted by agd5f View Post
                    these sorts of things don't always work out as well as you would hope in practice.
                    I've done enough performance tuning to have seen my share of surprises. It will definitely take some experimentation and tuning of different approaches. But, to just throw up your hands strikes me as very lame. This is a sufficiently straight-forward problem that I'm sure there are scheduling strategies that can deliver a net-win or break-even on the substantial majority of workloads.

                    Originally posted by agd5f View Post
                    Look at CPU frequency scaling;
                    That has a strong temporal aspect to it, which tends to make it more challenging.

                    Comment

                    Working...
                    X