Announcement

Collapse
No announcement yet.

Hyper Threading Performance & CPU Core Scaling With Intel's Skylake Xeon

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hyper Threading Performance & CPU Core Scaling With Intel's Skylake Xeon

    Phoronix: Hyper Threading Performance & CPU Core Scaling With Intel's Skylake Xeon

    As some extra benchmarks following this week's 9-Way Intel Xeon E3 v5 Skylake Benchmarks On Ubuntu Linux, some Phoronix Premium readers wanted to see how well Hyper Threading worked for these latest-generation Xeon E3 processors and the core scaling efficiency of Skylake...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Originally posted by atomsymbol

    It would be nice to conclude such articles with a single number describing the overall efficiency of hyper-threading (2 hyper-threads) in respect to multicore-threading (2 "real" cores).

    It can be computed as:
    efficiency = average(100% * (perf_8_hyperthreads_4_cores / perf_4_cores) / (perf_4_cores / perf_2_cores))

    or more accurately:
    efficiency = average(100% * (perf_4_hyperthreads_2_cores / perf_2_cores) / (perf_4_cores / perf_2_cores))

    where perf_N_cores is the performance of the benchmark on N "real" cores.
    Anyone is free to write a PTS module to compute this automatically, would happy to then utilize it.
    Michael Larabel
    https://www.michaellarabel.com/

    Comment


    • #3
      Originally posted by atomsymbol

      It would be nice to conclude such articles with a single number describing the overall efficiency of hyper-threading (2 hyper-threads) in respect to multicore-threading (2 "real" cores).

      It can be computed as:
      efficiency = average(100% * (perf_8_hyperthreads_4_cores / perf_4_cores) / (perf_4_cores / perf_2_cores))
      or more accurately:
      efficiency = average(100% * (perf_4_hyperthreads_2_cores / perf_2_cores) / (perf_4_cores / perf_2_cores))
      where perf_N_cores is the performance of the benchmark on N "real" cores.
      That would be cool to see graphed. It's one of the complaints I personally have about SMT architectures. Intel's Architectures decode much faster than I gave them credit for. I'm pretty sure these benchmarks at least show decode performance to some degree when SMT is turned on. It would be nice to see it on and off for each 1, 2, and 4 core configurations. That would give a better indication of just how much decode performance can be attributed to the SMT capability.

      EDIT: This also is entirely relevant for when AMD releases Zen, and then I'm sure we'll all be curious to know the facts about that. I sure hope to see it graphed well by then
      Last edited by duby229; 27 February 2016, 06:29 PM.

      Comment


      • #4
        I wonder if the non linear scaling is due to power management. i.e. when one core is loaded it is overclocked more than when more than 1 core is loaded.

        Comment


        • #5
          Originally posted by boxie View Post
          I wonder if the non linear scaling is due to power management. i.e. when one core is loaded it is overclocked more than when more than 1 core is loaded.
          Non linear scaling is due to hyperthreading essentially splitting up the resources of a single core to be put to more efficient use in multiple threads.... ie you have have 2 threads if one thread stalls due to a memory access another thread can run if it's data is cached.

          Hyperthreading usually only adds about 25% or so performance on average on an application that scales *well* ... you still have to deal with poor algorithmic scalability.

          So on a quad core with hyperthreading you expect to see around the performance of a 5-6 core processor without hyperthreading which is exactly what we see.

          Comment


          • #6
            Originally posted by cb88 View Post

            Non linear scaling is due to hyperthreading essentially splitting up the resources of a single core to be put to more efficient use in multiple threads.... ie you have have 2 threads if one thread stalls due to a memory access another thread can run if it's data is cached.

            Hyperthreading usually only adds about 25% or so performance on average on an application that scales *well* ... you still have to deal with poor algorithmic scalability.

            So on a quad core with hyperthreading you expect to see around the performance of a 5-6 core processor without hyperthreading which is exactly what we see.

            Sure (this is due to cramming 2 instructions down a long pipe and hoping that prediction gets it right) - but it is the non HT scores do not scaling linearly that I was more interested in

            Comment


            • #7
              Originally posted by boxie View Post


              Sure (this is due to cramming 2 instructions down a long pipe and hoping that prediction gets it right) - but it is the non HT scores do not scaling linearly that I was more interested in
              Multithreading never scales perfectly, because the different threads have to communicate about what they are working on and take locks to avoid race conditions and overwriting data another one is already using.

              Extra power/heat downclocking the cores is certainly another possibility though. I think at one point Intel cores would overclock when they were running single-threaded code but not multi-threaded, but I'm not sure that's still the case with Skylake.

              Originally posted by cb88 View Post
              Hyperthreading usually only adds about 25% or so performance on average on an application that scales *well* ... you still have to deal with poor algorithmic scalability.
              Hyperthreading performance can differ vastly across hardware architectures as well. The original P4 implementation by Intel was pretty awful and led to a lot of misconceptions about how good it could be. The recent generations tend to be pretty good, though, and POWER5 chips also have a good implementation.
              Last edited by smitty3268; 28 February 2016, 01:21 AM.

              Comment


              • #8
                Originally posted by atomsymbol
                The current state of enabling step-by-step debugging in a PHP IDE is totally absurd.
                1. You are totally correct (why do you thing debug print statements are used so much)
                2. Wuss *runs away*

                Comment


                • #9
                  Originally posted by cb88 View Post
                  Non linear scaling is due to hyperthreading essentially splitting up the resources of a single core to be put to more efficient use in multiple threads.... ie you have have 2 threads if one thread stalls due to a memory access another thread can run if it's data is cached.

                  Hyperthreading usually only adds about 25% or so performance on average on an application that scales *well* ... you still have to deal with poor algorithmic scalability.

                  So on a quad core with hyperthreading you expect to see around the performance of a 5-6 core processor without hyperthreading which is exactly what we see.
                  I was one who was curious about HT / non-HT comparisons. It is indeed allowing work when 1 thread memory stalls; for relatively small amount of extra core transistors to duplicate registers. BUT there are problems when the working set of 2 threads exceeds cache size.

                  Given that 4 cores deliver generally 3-3.5x performance of single, for HT to deliver as much as it does is pretty impressive.
                  A test with 1 & 2 with HT cores would reduce the effects of TDP throttling, would be interesting to know how 2 threads on same core affect turbo mode, given that it's the CPU which runs each when the other is blocked, rather than the O/S scheduler.

                  Comment


                  • #10
                    Originally posted by atomsymbol
                    Do you disagree with any of these points?
                    Nope. I would add to it that step by step debugging of PHP in an IDE is not done well in many products. (PhpEd with their dbg extension on windows was probably the best I have used - it worked rather well)

                    Comment

                    Working...
                    X