Announcement

Collapse
No announcement yet.

AMD Ryzen 9 7900X3D Linux Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by coder View Post
    Not if the P-core is being shared by 2 threads. And if we're talking about a low-ILP task that's memory-bound, then it doesn't really matter where it's running.

    If it were such an easy problem, Intel wouldn't have created a hardware block (i.e. the Thread Director) for accumulating metrics about threads to help the OS' scheduler decide where to run them.


    Intel even claimed to have developed a deep learning model to translate the raw metrics into a classification the OS scheduler can use more easily.


    The difference in clock speed is small enough that if you have a thread where the additional L3 cache makes a significant difference in hit rate, then it's a pretty obvious win to put it on the die with the additional cache.
    Now I have seen no definitive numbers on the what the 3d cache ccd runs at but most rumours said that the non-3d cache ccd boosts 14% higher then the 3d-cache one and I don't know if that is "small enough" to be sure that a lower cache miss rate would be better than a potential 14% higher clock boost. I think that additional complex question here is that the boost is just that, potential, and not guaranteed. Not sure what the base clock difference is between the two ccds (if any).

    And for a application as a whole a different thread might benefit from being able to run when the other is waiting for that cache miss while if you now schedule them on different ccd:s you induce the heavy inter-ccd latency. In the end I do think that this is a bit more complex and error prone that what you make it up to be and one telling thing here is that the W11 scheduler seems to simply identify games by whitelisting and then simply parking the other ccd.

    Comment


    • #22
      Originally posted by drakonas777 View Post
      Does not change the fact P core is faster core.
      It's entirely relevant, if what you're trying to do is schedule threads!

      Originally posted by drakonas777 View Post
      Does not matter for this particular workload.
      Which "particular workload"? Don't tell me there are no memory-bound or I/O-bound threads, in any of these benchmarks.

      Originally posted by drakonas777 View Post
      Yet again - does not change the fact P core is faster core.
      Yes! The cores have different performance and energy profiles. That's why scheduling them is complicated!

      Originally posted by drakonas777 View Post
      What Intel did is not a proof that something is easier or harder.
      It's not proof, but when they commit resources to design, test, debug, document, and fab a hardware block that then must be supported by OS vendors, it should tell you they perceive a complexity which justifies such a solution. Not least of all one that's a ripe target for side-channel attacks, which they clearly seemed to appreciate.

      Comment


      • #23
        Originally posted by F.Ultra View Post
        And for a application as a whole a different thread might benefit from being able to run when the other is waiting for that cache miss while if you now schedule them on different ccd:s you induce the heavy inter-ccd latency. In the end I do think that this is a bit more complex and error prone that what you make it up to be
        There's an undeniable bit of knapsack problem in this. However, there are clearly also some low-hanging fruits. Time will hopefully tell, but I do think we'll see worthwhile scheduling strategies emerge.

        Comment


        • #24
          Originally posted by coder View Post
          That's not what we've seen in mobile contexts, for instance. Apple likes big caches because cache lookups are more energy efficient than DRAM fetches, even enough to offset the cost of powering bigger caches.

          Also, we really ought to be looking at energy efficiency in terms of the total Joules needed to perform a defined workload. That's the gold standard for measuring energy efficiency, unless you're specifically concerned about gaming, video playback, or some other realtime task.
          Good point. Extra cache sips power but may also help to conserve power elsewhere.

          We need more than one type of efficiency: cpu power (watts) is related to cooling, VRM temperature, cpu power cables and such.
          The total system power is related to PSU. And total task energy is related to overall energy bill and room temperature.
          On top of that I'd love to hear about idle power, and how power-hungry are chipsets/memory.

          Comment


          • #25
            Originally posted by niner View Post

            Err....if you mean the 7900X with "24 full", you are off by a factor of 2. The 7900X has just 12 cores. So yes, the 13900k can keep up with the 7900X3D, but only with twice the number of CPU cores and almost twice the power consumption.
            i completely blanked on that. you are correct. hey, at least those e cores are not worse than SMT overall! haha.
            Originally posted by niner View Post

            Err....if you mean the 7900X with "24 full", you are off by a factor of 2. The 7900X has just 12 cores. So yes, the 13900k can keep up with the 7900X3D, but only with twice the number of CPU cores and almost twice the power consumption.
            correct me if i'm wrong, but doesn't zen have dimishing returns with to much cache due to infinity fabric bottleneck?

            i remember this podcast with that amd guy talking about it: https://youtu.be/ha_U8rrxvyE

            Comment


            • #26
              I wonder what this 'concern' is over energy usage is all about. To me it isn't a big deal. If I need more processing power, I expect to use more energy. I guess the big deal is 'how you cool' the processor, but otherwise, I don't see the point of 'worrying' about it. Of course higher efficiency is always a good thing, because you then push the CPU harder for even more MIPS, but otherwise....

              Comment


              • #27
                Originally posted by rclark View Post
                I wonder what this 'concern' is over energy usage is all about.
                Laptops, servers, and people who work in upstairs rooms with hot summers and high electricity costs.
                Last edited by coder; 08 March 2023, 04:34 PM.

                Comment


                • #28
                  Originally posted by rclark View Post
                  I wonder what this 'concern' is over energy usage is all about. To me it isn't a big deal. If I need more processing power, I expect to use more energy. I guess the big deal is 'how you cool' the processor, but otherwise, I don't see the point of 'worrying' about it. Of course higher efficiency is always a good thing, because you then push the CPU harder for even more MIPS, but otherwise....
                  Getting a comparable processing power from a Ryzen 7950x3d to an Ryzen 7950x or a i9 13900k at half of the power consumption is a thing for me. Especially as this means easier cooling and therefore a more silent system with less effort.

                  I would always prefer the modell that is easier to cool as i would expect better longlivity and easier builds. This is even worth some money to me.

                  Comment


                  • #29
                    Originally posted by pieman View Post
                    correct me if i'm wrong, but doesn't zen have dimishing returns with to much cache due to infinity fabric bottleneck?

                    i remember this podcast with that amd guy talking about it: https://youtu.be/ha_U8rrxvyE
                    As far as I know that discussion related to having cache on the IO die rather than on the core die. The X3D parts have all their cache on the CCD.

                    I believe there is still IF traffic in the case where an access from one CCD misses all caches on that CCD but hits in a cache on the other CCD, but the increased CCD-to-CCD traffic from larger caches trades off against reduced CCD-to-DRAM traffic.
                    Test signature

                    Comment


                    • #30
                      Originally posted by agd5f View Post
                      Is it a problem? Even without any special scheduling, it's still a net win in most things. Having better scheduling would just be icing on the cake.
                      Yeah. The previous gen 3D version wasn't crippled. What I mean is that I want a big cache on every core. I expect that when I run an app to have the big cache. Because that's what I pay for. That's why I buy 3D version of the CPU. I want a big cache on every app I run. No exceptions.

                      I don't want performance variations just because one day it decided to place the app on a core without the cache and tomorrow on one with cache. How do you not see this as a problem?

                      Comment

                      Working...
                      X