Announcement

Collapse
No announcement yet.

Intel Thread Director Virtualization Patches Boost Some Workloads By ~14%

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    cpu-z added the ability to bench e and p cores separately with the latest version. after its release, i booted up windows 11 off a spare ssd to try it out.

    these are the results of my 13900k:
    e core:

    p core:

    e and p core combined:


    the e cores actually don't really suck ipc wise. they bench around zen2 level of ipc performance. i don't think any honest person would sit there and state zen2 ipc sucks. but compared to the p cores, yeah, its a giant difference but the e cores don't actually suck. they actually have good performance, far more denser, and use less power than the p cores. density wise, intel could have either fit 2 more p cores into my 13900k, or 16 e cores with zen 2 level of ipc. i'm not surprised they went the e-core route. then you get all the benefits of scheduling with big.LITTLE architecture. why waste cpu cycles of the p-cores on something like running discord in the background when that can be dumped onto the e-cores. network traffic? web browser? dump it on the e-cores. leave the p cores for stuff that matter more. one thing i like is stuff like steam, when downloading a large game, and putting it in the background, it switches to using the e-cores rather than the p cores. power usage drops by 30 watts, my fans spin down lower, and frees up the p cores. all you are doing is downloading and writing as you go, why waste that on p cores? running on p cores won't make that go any faster, but will use more power compared to the e cores.

    yeah, i hear people talking about how "power efficient amd is" and all that nonsense. but they ignore a big point. alright, imagine how much more power efficient amd would be if they adopted big.LITTLE too? why draw 110 watts when they could be drawing 75 watts.

    the only real complaint that i consider valid is the different isa. but seeing meteor lake, and its successor, i think intel is fixing that issue. overall i like the e core and p core big.LITTLE thing intel has been doing and i hope they don't stop doing it.
    Last edited by pieman; 03 February 2024, 02:46 PM.

    Comment


    • #12
      Intel E cores are useful for the surface area/MT per ratio, MT benchmarks (so basically Intel regarding latter two) and those very few people, who do CPU-based rendering, but can't afford HEDT/WS/HPC class CPU. That's all really. Even in code compilation the practical benefits of E cores are questionable, since, for example, 14900K is almost identical to 7950x despite the fact it has 8 more cores. Furthermore, in cases where i9 does a little better compiling, it's most likely due higher throughput P cores compared to ZEN4, because compilation as such does not scale with core count linearly. Now, if we consider the fact absolute majority of massively parallel workloads are accelerated by GPUs and the tendency of growing NPU-accelerated AI based workloads, this "many E cores" strategy does not seem at all as a practical solution in the long term. (And those who think it's just because AMD has better lithography, go google Meteor Lake vs Phoenix benchmarks).

      At the end of the day E cores allowed Intel to stay competitive (performance wise) with AMD by reusing existing IP they have, so it makes sense for Intel to do them and they do add some value for mobile platforms. However, for the desktop/WS (TBH in mobile space for the most part as well) intel hybrid architecture did not demonstrate nothing close to the universally better approach than AMD's unified approach aside some per-product benefits (like for production at given price point a specific i5 may be better choice than a specific R5 due E cores boost in MT).

      Well, perhaps having extra 24 cores in 8P + 32E config will help Intel to universally kill AMD's 16C in the future
      Last edited by drakonas777; 03 February 2024, 03:50 PM.

      Comment


      • #13
        Intel naming sucks. Their CPU names suck, there software feature names suck. While I wouldn't buy Intel anyways, this is just one more reason, I don't have time to remember 10000 marketing terms to know what I'm actually running.

        Comment


        • #14
          Originally posted by ms178 View Post
          To my knowledge, the original (non-VM) Thread Director patchset hasn't landed in mainline yet and I haven't seen any benchmarks with/without it yet, but hope that it yields some improvements.
          I'd be very interested in seeing benchmarks of that, because my gut says there probably isn't much of a difference and it's mostly used as a workaround for poor windows scheduling.

          That would explain why nobody is particularly bothered about getting it into linux very quickly.

          That said, maybe it does make a difference. I'd love to see benchmarks to see for sure rather than just guessing.

          Comment


          • #15
            Originally posted by espi View Post
            E-cores are denser than P-cores (0.5x performance for 0.25x area, you can put 4 E-cores in the space of a single P-core, with double the mt performance). So for multithreading they are better than cramming more P-cores.

            In fact I think there is very little point to having more than 8 P-cores at all. Any workload that scales to more than 8 cores probably scales to n-cores.
            Everything I learned at school tells me that the more cores you have, the less the performance gains scale due to synchronization overhead. The more cores, the bigger the overhead. 8 cores with performance N are always faster than 16 cores with performance N/2.

            Comment


            • #16
              Originally posted by ms178 View Post
              To my knowledge, the original (non-VM) Thread Director patchset hasn't landed in mainline yet and I haven't seen any benchmarks with/without it yet, but hope that it yields some improvements. There was also some related work around core topology by Thomas Gleixner. Considering that Alder Lake has been in the wild for a while, it is a bit surprising that all this work takes such a long time to materialize. While it is a hard task, their army of engineers were able to implement this feature for Windows on time after all. Maybe Linux simply wasn't a priority as no server core went with such a P/E-design.
              Yes, I also got confused, I thought the non-vm Intel Thread Director patchset didn't enter mainline yet. Can somebody clarify on what is the current situation?

              Comment


              • #17
                Originally posted by RealNC View Post

                Everything I learned at school tells me that the more cores you have, the less the performance gains scale due to synchronization overhead. The more cores, the bigger the overhead. 8 cores with performance N are always faster than 16 cores with performance N/2.
                Funny because for server, they opt to use 64 cores with worse performance than desktop chip.

                Thing is, adding CPU cores ktself does not add overhead, the overhead is merely from the software itself, the algorithm used.
                It's true that for many algorithms there's a limit to the parallelism, hence why on desktop it's rare to see more than 32 cores, but stuff like compilation is often linearly scalable.

                Comment


                • #18
                  What sucks about e-cores is that they are the reason that AVX-512 is disabled/missing in alderlake/raptorlake cpus.

                  Comment


                  • #19
                    I still have an old haswell cpu. But are on these new CPUs low priority /high niceness processes automatically assigned to e-cores?

                    Comment


                    • #20
                      Originally posted by RealNC View Post

                      Everything I learned at school tells me that the more cores you have, the less the performance gains scale due to synchronization overhead. The more cores, the bigger the overhead. 8 cores with performance N are always faster than 16 cores with performance N/2.
                      What makes e-cores interesting is that's it's not 8n vs 16(.5n).

                      It's 8n vs 32(.5n), because you can fit 4 cores but each is still half the performance. So as long as the synchronization overhead isn't over half the work, it ends up being faster.

                      The p cores have to add a bunch of extra silicon to keep adding increasingly marginal amounts of single-threaded performance, while the e-cores focus on what can be done more efficiently.

                      Comment

                      Working...
                      X