Announcement

Collapse
No announcement yet.

AMD Ryzen Threadripper 7980X & 7970X Linux Performance Benchmarks

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by aviallon View Post
    I was wondering, could you test an allmodconfig kernel build with some of the best CPUs from this benchmark, but with the kernel sources and the build output ("everything") in a tmpfs.
    It should be interesting.
    I think tmpfs probably won't change much. The problem is that 128 GB isn't actually very much RAM for 128 threads, in which case the build products are all getting written out, anyhow.

    I was thinking about this, and I think a bigger issue might be the need to run sync; fstrim, prior to the build. This should hopefully clear up enough space for the drive's SLC write buffer (i.e. by clearing out deleted products of other tests) to hold the forthcoming build products, so that part doesn't become a bottleneck.

    Comment


    • #22
      Originally posted by schmidtbag View Post
      C'mon now, you're not dumb.
      Yes, and I've also spent much of my life watching & timing builds of various software on various hardware. Assuming the buildsystem isn't messed up, I've always seen a full build scale very linearly. The only caveat that comes to mind is that kernel bottleneck affecting GNU Make's load-balancer, that was fixed several years back.

      Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite


      Anyway, you can review the benchmarks I referenced or not. Right now, you're just spouting a lot of random nonsense that's not backed by any data or analysis, so I don't feel the need to debate you point-by-point.

      Comment


      • #23
        Originally posted by coder View Post
        Yes, and I've also spent much of my life watching & timing builds of various software on various hardware. Assuming the buildsystem isn't messed up, I've always seen a full build scale very linearly. The only caveat that comes to mind is that kernel bottleneck affecting GNU Make's load-balancer, that was fixed several years back.

        Anyway, you can review the benchmarks I referenced or not. Right now, you're just spouting a lot of random nonsense that's not backed by any data or analysis, so I don't feel the need to debate you point-by-point.
        You seem weirdly butthurt and aggressive because of a single benchmark that we both agreed is odd. You're not doing yourself favors by riding your high horse over an anecdote. It doesn't change my point. Just look at reviews of the 3990WX and you'll find that alone checks many of the boxes I listed. That CPU on average was weirdly slow for how many cores it had. Read other comments in this thread and they allude to other things I listed. Compare the performance to Windows and that affirms other things I listed.
        Just because code compiling didn't scale linearly when traditionally it normally does, that doesn't negate anything else I said. I don't care to look that deep into the issue, so I just gave my conjecture of why that might be the case. If I'm wrong, fine, but I gave your intellect more credit than it deserves if you think that this one test and the article you linked to is somehow proof that anything I listed is "random nonsense".

        Comment


        • #24
          Ordered a 7960X myself. I predict it will slot right in between the 7950X and 7970X.

          Comment


          • #25
            Originally posted by schmidtbag View Post
            You seem weirdly butthurt and aggressive because of a single benchmark that we both agreed is odd.
            Sure deflect.

            It's not just one benchmark, either. It's six, which all show similar scaling (although two of those are the Linux kernel).

            It bothers me because it's the main use case I have for a big, multicore CPU. So, I want to know why it doesn't scale as I have seen and expect.

            Again, I'm not going to argue with you, because you're just repeating random factoids and obviously don't have a clue what's actually happening here. I've spent a lot of time on performance optimization, and it's usually one dominant factor or bottleneck. Find it, and maybe you can fix it.

            It does no good to just fling a wad of excuses at the wall and walk away. If you don't know, you don't know. It's as simple as that. You're just hurting the signal-to-noise ratio by saying anything, when you can't point to a clear fact pattern to support a specific theory.

            Here's a benchmark which shows a dual-9654 (2x 96x Zen4) achieving about 2x the Linux Kernel compilation performance as a dual-7763 (2x 64x Zen3). That pretty clearly shows it can scale very well, even far beyond the number of cores tested here:

            It also disproves most of your funny ideas about why this Threadripper isn't scaling well on these compilation benchmarks. Sadly, I've gone through that article and haven't found what storage drives/configuration they used.​
            Last edited by coder; 21 November 2023, 04:08 PM.

            Comment


            • #26
              Comparing the 32-core and 64-core performance, Anandtech also found poor scaling on compilation workloads:

              That middle one is very weird. Some substantial part of that build seems to be dominated by single-threaded performance, because it shows the 32-core outperforming the 64-core, and then the i9-14900K comes along and blows them both away with yet a further halving of the thread-count.

              BTW, they did use a slightly better client SSD: SK Hynix Platinum P41 2TB PCIe 4.0 x4 . These two graphs let us compare the sustained write performance of the drive Michal used (WD SN850 1 TB) with the one Anandtech used. I had to cite two different graphs, because they tested 1TB and 2TB drives separately, and you'll see that the amount of write buffering differs between them.

              Pay attention to the numbers, because the scales are obviously different. What we see is that both drives exhaust their SLC buffers at a little over 50 seconds. The P41 Platinum starts and finishes a bit higher, but their performance profile is both similar.

              Sadly, I don't see where Anandtech said how much RAM they're using, but I wouldn't be surprised if it was the same 128 GB that Michael used. If so, that probably won't be enough working memory for 128 threads of compilation + all the caching and write buffering you'd wish for.

              What I wonder might be happening is that the benchmark could be starting with the SSD's write buffer nearly full, from all of the prior installation & testing. When they start compiling, they could be almost immediately hitting the wall on write performance.

              Michael, would you rerun the allmodconfig kernel compile benchmark on the 64-core CPU? Simultaneously, run this in another window:

              iostat -x 1 > iostat.log


              Then, post it somewhere where we can see it. If we can see the iowait and idle time, over the course of the benchmark, we can distinguish between bottlenecks due to the SSD vs. lack of concurrency. If neither is very high, then it's likely some bottleneck inside the CPU or perhaps insufficient memory bandwidth.

              Also, please be sure to note the actual benchmark result, so that we know whether & how well the performance problem has been reproduced.
              Last edited by coder; 21 November 2023, 04:50 PM.

              Comment


              • #27
                Originally posted by coder View Post
                Comparing the 32-core and 64-core performance, Anandtech also found poor scaling on compilation workloads:
                That middle one is very weird. Sub substantial part of that build seems to be dominated by single-threaded performance, because it shows the 32-core outperforming the 64-core, and then the i9-14900K comes along and blows them both away with yet a further halving of the thread-count.

                BTW, they did use a slightly better client SSD: SK Hynix Platinum P41 2TB PCIe 4.0 x4
                PHP compilation isn't a very good test for large core count CPUs. Codebase isn't large enough.

                Anandtech uses PTS tests for their Linux tests. Here you can see a large composite that shows how PHP compilation doesn't scale so well for large core counts:

                OpenBenchmarking.org, Phoronix Test Suite, Linux benchmarking, automated benchmarking, benchmarking results, benchmarking repository, open source benchmarking, benchmarking test profiles
                Michael Larabel
                https://www.michaellarabel.com/

                Comment


                • #28
                  Originally posted by Michael View Post
                  PHP compilation isn't a very good test for large core count CPUs. Codebase isn't large enough.

                  Anandtech uses PTS tests for their Linux tests. Here you can see a large composite that shows how PHP compilation doesn't scale so well for large core counts:

                  https://openbenchmarking.org/test/pts/build-php#results
                  Thanks for the info.

                  Do you think you might find time to try the experiment I suggested? No rush. See end of my updated post, where I listed the iostat command.

                  Comment


                  • #29
                    Originally posted by coder View Post
                    Sure deflect.
                    Says the one doing just that...
                    It's not just one benchmark, either. It's six, which all show similar scaling (although two of those are the Linux kernel).
                    All you've been talking about is compiling. Seeing as some tasks are scaling just fine, you're naive if you think there aren't different possible reasons for other tasks to not scale linearly.
                    Again, I'm not going to argue with you...
                    Then stop arguing with me.
                    It does no good to just fling a wad of excuses at the wall and walk away. If you don't know, you don't know. It's as simple as that. You're just hurting the signal-to-noise ratio by saying anything, when you can't point to a clear fact pattern to support a specific theory.
                    Alpha64 suspected performance issues, thinking power and thermals may be the culprit (notice I also pointed those out in my "random nonsense" post). Those are perfectly reasonable assumptions. Neither of us are obligated to appease your demands when it doesn't take a genius to know that ~350W of compute power might be too demanding on certain motherboards or heatsinks. Are we right in our theories? Perhaps not, but it's not "noise" to propose a possible cause and investigate whether that problem can be eliminated. It's rather foolish to just assume it isn't a possibility when historically, such things have been problems on other architectures. Again, look up benches of the 3990WX.
                    Here's a benchmark which shows a dual-9654 (2x 96x Zen4) achieving about 2x the Linux Kernel compilation performance as a dual-7763 (2x 64x Zen3). That pretty clearly shows it can scale very well, even far beyond the number of cores tested here:

                    It also disproves most of your funny ideas about why this Threadripper isn't scaling well on these compilation benchmarks. Sadly, I've gone through that article and haven't found what storage drives/configuration they used.​
                    Uh-huh, and? We've already established from the beginning that these TRs have the ability to scale well, so good job at stating something all of us already knew. That article did nothing to prove anything I said wrong. There are a lot of significant differences with those 9654s, such as (but not limited to), more mature firmware or microcode, lower boost clocks (which as Alpha64 pointed out, could affect power limitations or thermal throttling), bigger caches, much more memory channels (you need a lot of bandwidth to feed that many cores. For TR, higher frequencies increases bandwidth demands, but you should know that already), and perhaps there's some difference in the I/O die (I didn't look into it because I don't care enough - perhaps this is just "noise"). I shouldn't have to provide you graphs for you to understand how TR's specs could limit its potential.
                    Last edited by schmidtbag; 21 November 2023, 05:15 PM.

                    Comment


                    • #30
                      Originally posted by schmidtbag View Post
                      Says the one doing just that...
                      I'm the one trying to get to the bottom of what's going on here. All you've done is whine at being called out for making an ignorant statement. And then you double-down and triple-down, because the only way you can win arguments is by trying to outlast the other poor fellow.

                      Originally posted by schmidtbag View Post
                      ​All you've been talking about is compiling. Seeing as some tasks are scaling just fine, you're naive if you think there aren't different possible reasons for other tasks to not scale linearly.
                      Obviously, there's some bottleneck affecting compiling and not certain other things. One of the main differences between it and the benchmarks that scale almost linearly is the amount of I/O it's doing. Again, you're not helping.

                      Originally posted by schmidtbag View Post
                      ​​thinking power and thermals may be the culprit (notice I also pointed those out in my "random nonsense" post). Those are perfectly reasonable assumptions.
                      You've cited no fact pattern leading to such a conclusion. How does the data support that?

                      Originally posted by schmidtbag View Post
                      ​​​Neither of us are obligated to appease your demands
                      Not demands, just an expectation that people are interested in having a constructive discussion. I don't have time for your fragile ego, today.

                      Originally posted by schmidtbag View Post
                      ​​​​Unlike you, neither of us cared enough to investigate further because it's just not that important to us.
                      But you're still replying because... why?

                      Originally posted by schmidtbag View Post
                      ​​​​​If you seriously find it that noisy to have these suggested possibilities, then get off the comments section of articles
                      Cool, so if anyone interested in actual facts, conclusion, or solutions leaves, the comments will just be 100% whiners, wankers, and nutters. We're already half way there.

                      Originally posted by schmidtbag View Post
                      ​​​​​​We've already established from the beginning that these TRs have the ability to scale well, so good job at stating something all of us already knew. That article did nothing to prove anything I said wrong.
                      You completely missed the point. Since you're not being serious, I'm not going to waste the time to explain it to you.

                      Comment

                      Working...
                      X