Announcement

Collapse
No announcement yet.

Benchmarking AMD FX vs. Intel Sandy/Ivy Bridge CPUs Following Spectre, Meltdown, L1TF, Zombieload

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by Eero View Post
    When you say they're at 100% utilization, did you check how much of that was IO-wait? In "top", press "1" to see all cores separately, "wa" column then shows IO-wait percentage for each core. In compilation case, IO-wait would be waiting on disk (in other cases it could be waiting for network, GPU, anything else than CPU).
    I'm rebuilding Firefox Nightly now using the updates from the past day. CPU on both cores averaging about 95%. IO-wait per top is typically at the lower end of a 0.0-3.0 range, spiking as high as 44.5 in rare moments.

    Comment


    • #22
      Originally posted by numacross View Post
      There are no "hardware threads" because they share resources of the single physical core making them not equal.
      The hardware threads sharing a single physical core doesn't make them not equal. If you are comparing a situation when both hardware threads are running to when one is running, then you are right, but I was not talking about such a situation. I just meant that both of these hardware threads alone are exactly the same when they run with the same circumstances (either when they run alone on a physical core or when they run alongside another hardware thread).

      Originally posted by numacross View Post
      Linux from 2.6 and Windows from XP have HT-aware schedulers that try to avoid placing 2 threads on both "virtual cores" of the same physical core. They instead prefer using one thread from 2 separate cores in order to minimize the cost of sharing resources.
      Correct, making the effect of HT on single-thread benchmarks (like perl interpreter startup) even less likely.

      Originally posted by numacross View Post
      If what you say is true then why bother with modifying scheduling because of HT?
      As written above, I didn't say that the hardware threads don't lose performance when they are not running alone on a core. They do. But when both threads run the same code, their performance should be mostly identical. When one of them runs code and the other doesn't, it shouldn't matter which one does - the performance should be the same either way.

      The term "virtual core" makes you believe that there are "good cores" and "virtual cores", which is totally incorrect. The original quote can be modified in the following way to make it less ambiguous:

      Originally posted by modified quote
      Perl can actually perform better if HT is disabled to avoid the chances it's stuck running on a hardware thread sharing a physical core with another running thread.

      Comment


      • #23
        Originally posted by Wielkie G View Post
        But when both threads run the same code, their performance should be mostly identical.
        I think this is what I'm disagreeing the most. If said code uses the shared units heavily (floating-point for example or SIMD) then the threads will vary wildly in execution latency because of waiting periods.

        Originally posted by Wielkie G View Post
        When one of them runs code and the other doesn't, it shouldn't matter which one does - the performance should be the same either way.
        Yes, or if they are able to saturate the duplicated execution pipelines (integer loads for example).

        Originally posted by Wielkie G View Post
        The term "virtual core" makes you believe that there are "good cores" and "virtual cores", which is totally incorrect.
        But they are different, and every operating system knows that there is a difference between the core 0 and 1 in a single-core HT CPU. This scales of course to multi-core ones.

        Depending on the load and scheduling you can ignore this difference or it'll bite you hard.


        Comment


        • #24
          I tend to keep my hardware around for a while and I initally bought a cheap FX-8350 instead of a 3770k years back for packaging purposes (it's even outfitted with a surprisingly cheap 32GB DDR3-2400 RAM kit) since it cost half the money (both CPU and motherboard) for the same amount of performance in that specific task back when I got it.

          With the newest vulnerabilities coming to light, I'm not exactly regretting that decision. And with the DDR4 RAM prices only dropping recently, moving to a newer platform hasn't really been on the table cost-benefit wise in the past.

          Thanks for the benchmarks Michael!

          Comment


          • #25
            Originally posted by ermo View Post
            I tend to keep my hardware around for a while and I initally bought a cheap FX-8350 instead of a 3770k years back for packaging purposes (it's even outfitted with a surprisingly cheap 32GB DDR3-2400 RAM kit) since it cost half the money (both CPU and motherboard) for the same amount of performance in that specific task back when I got it.

            With the newest vulnerabilities coming to light, I'm not exactly regretting that decision. And with the DDR4 RAM prices only dropping recently, moving to a newer platform hasn't really been on the table cost-benefit wise in the past.

            Thanks for the benchmarks Michael!
            That's how I feel about my Westmeres I picked up a few years ago when my Q6600 (with the FSB mod) didn't cut it since they were both dirt cheap and I wanted a system that supported ECC for ZFS. I ended up with dual X5687s ([email protected], 16 with SMT) in a Dell T5500 with 48GB of ram (DDR3 1333 R-ECC) and 2x 480gb 7200rpm hdds for $350. I'm cheap, so if I can get an entire workstation for the cost of a new CPU, hells yeah. Found an RX 580 4Gb for only $140 earlier this year. For $490 total, it's a pretty decent setup for 1080p Linux gaming and compiling stuff here and there; especially once mitigations are factored in.

            Comment


            • #26
              Originally posted by debianxfce View Post
              4-8GB RAM is enough to disable swapping
              In my opinion, with year 2019 common CPU thread counts (8-16) and assuming 2 GiB of memory per thread in parallel tasks using all CPU threads, 16-32 GiB of memory is slowly becoming the norm for a desktop/workstation. 8-16 GiB is on the border of being a limiting factor to full utilization of the CPU.

              Taking a look at https://www.ec2instances.info most of the EC2 instances have at least 2 GiB of memory per vCPU. The Nano instances have less memory per vCPU (minimum being 256 MiB per vCPU) which is enough to run certain types of applications, but this does not negate the fact that the optimum for a year 2019 desktop/workstation is at least 2 GiB of memory per CPU thread.
              Last edited by atomsymbol; 05-25-2019, 05:56 PM.

              Comment


              • #27
                Originally posted by debianxfce View Post
                4-8GB RAM is enough to disable swapping.
                That's funny... I have a minimum of 12GB of memory constantly committed to non-cache things all day at work. This goes way up when I spin up testing VMs (often 2 or 3 at a time).

                My work laptop is maxed out at 16GB (Thinkpad t440p) and even with zswap enabled, I often go a few GB into swap when I have to test certain workflows.

                I'm looking forward to my next laptop refresh (this fall), so I can finally jump to 32GB RAM.

                Comment


                • #28
                  Originally posted by atomsymbol View Post

                  In my opinion, with year 2019 common CPU thread counts (8-16) and assuming 2 GiB of memory per thread in parallel tasks using all CPU threads, 16-32 GiB of memory is slowly becoming the norm for a desktop/workstation. 8-16 GiB is on the border of being a limiting factor to full utilization of the CPU.

                  Taking a look at https://www.ec2instances.info most of the EC2 instances have at least 2 GiB of memory per vCPU. The Nano instances have less memory per vCPU (minimum being 256 MiB per vCPU) which is enough to run certain types of applications, but this does not negate the fact that the optimum for a year 2019 desktop/workstation is at least 2 GiB of memory per CPU thread.
                  That's sort of how I factored ram for my current system. 8 cores * 2 for HT = 16 * 2 = 32GB So i figured 32gb was a decent starting point and ended up getting 48Gb because it was $10 more. Why not? What I didn't account for was systemd using half of that for /tmp by default so it really comes out to 48gb / 2sysd = 24Gb / 16HT = 1.5GB per thread. Just means I need to pick up another 24GB of ram to get that 2gb per cpu (with a 36gb ramdisk as a bonus). I do all my compiles on my current 24gb ramdisk except for Firefox with PGO...24Gb ain't enough for that (seriously) so I could actually make use of 72GB of ram.

                  Keep my large numbers in mind if you compile your own software or plan on doing it. We need assloads of ram for some of these compiler and optimization processes.

                  Comment


                  • #29
                    Originally posted by debianxfce View Post

                    Of course you need endless amount of RAM when you run windows, gnome3 or kde in your VMs. An average Xfce desktop PC user needs 4GB RAM, but you can not buy 2x2GB memory sticks and 2x4GB starts to be rare too. 2x8GB starts to be mainstream.
                    Which means you'd need 16GB of ram to cover 4 Debian XFCE VMs, 8 more for the host system, 8 more for /tmp....or 24GB of ram is where the average XFCE desktop PC user who runs VMs would want to start.

                    Comment


                    • #30
                      Originally posted by numacross View Post

                      But they are different, and every operating system knows that there is a difference between the core 0 and 1 in a single-core HT CPU. This scales of course to multi-core ones.
                      How are they different? The SMT is symmetric - each hardware thread is equal to the other one.

                      Try your favorite workload on core 0 (by setting core affinity) and then core 1. See that there is no difference between these two results.

                      Now try to run two instances of the workload, one on core 0 and the other on core 1. You will see that each workload is slower, but the aggregate throughput might be higher. For example, if each core throughput is 60% the original, then the aggregate is 120% and the SMT performance uplift is +20%.

                      For example (Windows, as I don't have access to Linux right now), the 7zip compression benchmark on my machine (i7 3770k) shows 4500-4600 MIPS on core 0 and on core 1, when only one core is being used. When I run two instances (one on core 0 and the other on core 1) they show 3000-3100MIPS each - that's 6000-6200 MIPS aggregate and a 30-40% uplift.

                      Comment

                      Working...
                      X