Announcement

Collapse
No announcement yet.

Intel i9-12900K Alder Lake Linux Performance In Different P/E Core Configurations

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by MadCatX View Post
    I'm not sure I follow your argument here. You're using 7zip as an extreme example or how memory requirements grow with the number of threads. But this has nothing to do with HT; you'd have the same problem if you doubled the actual core count. Also, there are lots of workloads where memory demands do not grow that much with more threads.
    right. you need more ram if you double the core count similar to what happens if you add hyperthreading but there is an important different:

    if you go from 1 core to 1core+HT you get up to 37% more performance in 7zip.

    if you go from 1core to 2 cores you get up to 100% more performance in 7zip...

    as you can see both need more ram but the second option gives you more performance per ram you add.

    first option you need to double your ram but you only get 37% more performance...

    second option you double your ram bur you get 100% more performance.

    "Also, there are lots of workloads where memory demands do not grow that much with more threads."

    this is true but if you do plan a system you can not count on this instead you messure the apps who need the most ram.
    Phantom circuit Sequence Reducer Dyslexia

    Comment


    • Originally posted by qarium View Post



      modern cpus without spectre :

      "

      ARM Cortex-A5 127 [Wiki 52]
      AMD PSP
      ARM Cortex-A7 MPCore 162 (RasPi 2)
      ARM Cortex-A53 MPCore 247 - - - in-order dual issue, with a branch predictor, according to ARM is not affected 269
      Includes Raspberry Pi 3 and many Android Phones. Example: Snapdragon 625 etc.
      [Early Intel Atoms including S/D/N Series] ((https://en.wikipedia.org/wiki/List_o...onnell_microar chitecture 249)
      Diamondville 217
      Pineview 216
      Cedar View
      VIA C7 132 - but does have a basic branch prediction scheme
      Intel Itanium 482 aka IA64 259 (This architecture is amazing and bizarre altogether)
      RISC-V 162
      RISC-V Rocket

      "

      there is a list of old cpus to but you can read it if you open the links.
      This list pretty much proves my point. To highlight the most important points:
      - Itanium aside, none of the listed chips were designed with performance in mind. Low power consumption was the key.
      - Some of the listed chips actually use speculative execution but it's implemented in a way that doesn't leak information.
      - The first generation of Atoms that is listed had both branch prediction and HT, even though in a simplified form (https://en.wikichip.org/wiki/intel/m...ctures/bonnell)
      - There is no chip with high performance or at least good performance/watt ratio on that list.

      This demonstrates that speculative execution does not necessarily have to be a security issue and that you need sophisticated speculative execution combined with OOOE for good performance.

      Originally posted by qarium View Post
      "A 128 core chip for servers is not a good example of a product for the average range or customers."

      well do we talk about the past ? or do we talk about the future ?
      I'm talking about the present. There is no 128 core chip that I could buy for a workstation PC, neither there are programs that would be able to make good use of all of those cores. This is unlikely to change within the next ~3 years.

      Originally posted by qarium View Post
      we talk about the future... and i am 100% sure we will have 32/64/128 core chips for the customers
      on laptop and pc and workstation and so one.

      and still it is a fact that the more cores you have the more useless hyperthreading becomes.
      Let's talk about the utility of HT on a 128 core chip once we can buy a 128 core chip. Until then it is pointless to speculate about it. Chips that we have right now (or will have in the forseeable future) won't have that many cores and therefore will benefit from HT.

      Originally posted by qarium View Post
      you can see this in phoronix.com benchmarks if michael tests 128 threads systems and if you test 64cores with or without hyperthreading it makes no difference because the overhead of hyperthreading at 64 cores is so high that you do not get any benefit.
      now you think only new systems have so much threads... no my 4 year old TR4 platform can hit this already with an 2990WX...

      in my point of view the end of hyperthreading is near as soon as the consumers get 32/64/128 core systems the useage of hyperthreading will be pointless.
      This has more to do with the fact that very few applications scale well beyond 64 threads. A machine that hosts hundreds of VMs that all need CPU time will give you a different picture.

      Originally posted by qarium View Post
      "The issue is that you often need a HT-like technology *in order to* get all of the performance out of your CPU."

      if you think about it only in the logic of hyperthreading then yes this is true...
      it is not true if you consider the fact that there are more dark silicon than you do "imagine"
      your dark silicon you talk about is only caused by information theory and logic.

      but there are other dark silicon according to wikipedia for example you can not transport the head away from a single point because of this you have to turn off the tranistors on the silicon.... what results in dark silicon.
      the other problem is there is no heat problem but an electric problem your electric wire can not transport the needed electrons.
      this also result in the fact that you have to turn of the tranistors on the silicon and this also result in dark silicon.

      in your mind only your information theory and your logic problem count in the matter of dark silicon.

      what you dont get is that you can design a cpu who runs so "hot" that any hyperthreading is useless because it would add more heat and you can not transport this heat away. also the electric wires are not able to transport more electrons means hyperthreading is useless because any use of it would be resulting in the flow of more electrons but the wires are already at the maximum.
      I'm having trouble tracking this stream of thoughts. Current x86 chips do not run into power budget problems when they use HT.

      Originally posted by qarium View Post
      you claim it is impossible to design such a cpu... i say it is possible and in the past one fact did make it impossible because you could not transport the information fast enough from ram into the cpu...
      TCI only works on very short distances. It can be used to propagate data through the CPU but RAM is physically too far away for TCI to be useful.

      Originally posted by qarium View Post
      but with technology like this: "ThruChip Interface (TCI) is a high-performance wireless vertical interconnect technology used to transmit signals between multiple stacked dies."
      you can transport the needed information so fast into the cpu that you can utilize so many cores to hit the max heat very fast.

      with that technologie you could put in 4 chiplets of each 16core cpu dies plus 16-32gb of ram all into the CPU package...
      resulting in 64cores with 32gb L3 cache...
      Where did you get these figures from?

      Originally posted by qarium View Post
      i am sure even if you do not do hyperthreading or this spectre speculation this cpu would have great performance.

      maybe not if you count single thread performance but for massive multicore workloads it would be a hit.
      What makes you think that?

      This all sounds like wild speculation on your part about what future chips might look like. Judging by the current Intel or AMD roadmaps, we won'd get anything like this at least until 2025. When we get such chips, it'll be time to reevaluate what technologies and design approaches are sensible. Until then we should stick with what is applicable to the chips we have now.

      Originally posted by qarium View Post
      "Without speculative execution you'd really struggle to keep the execution pipeline filled so your CPU would waste a lot of cycles just waiting for data to work with."

      this only comes into effect if you are able to cool the extra heat in one area of the cpu if you are not able to cool the heat it is pointless to add hyperthreading then. and if you go with TCI stacked dies the possibility of being able to transport all the heat away is zero... again: ZERO
      This sounds like another baseless speculation. Current CPUs perform best when they *can* keep the pipeline filled as much as possible.

      Originally posted by qarium View Post
      right. you need more ram if you double the core count similar to what happens if you add hyperthreading but there is an important different:

      if you go from 1 core to 1core+HT you get up to 37% more performance in 7zip.

      if you go from 1core to 2 cores you get up to 100% more performance in 7zip...
      This is an odd kind of argument. It's pretty damn obvious that an entire extra core will offer bigger performance boost than processing optimization like HT. The issue is that an extra core takes up much more space on the die and uses much more power than the circuitry used for HT management. A true 32 core CPU is, therefore, more expensive to manufacture and cool than a 16 core HT CPU. And that is the entire reason why HT is not a bad idea.

      Originally posted by qarium View Post
      this is true but if you do plan a system you can not count on this instead you messure the apps who need the most ram.
      Unless your workload consists of compressing and decompressing large 7zip archives, you probably don't care that 7zip might perform suboptimally because of memory constraints. If you edit audio or video, render 3D graphics, write code, run some scientific simulations etc., HT will give you a nice boost even without ridiculous RAM sizes.

      Comment


      • Originally posted by MadCatX View Post
        This list pretty much proves my point. To highlight the most important points:
        - Itanium aside, none of the listed chips were designed with performance in mind. Low power consumption was the key.
        - Some of the listed chips actually use speculative execution but it's implemented in a way that doesn't leak information.
        - The first generation of Atoms that is listed had both branch prediction and HT, even though in a simplified form (https://en.wikichip.org/wiki/intel/m...ctures/bonnell)
        - There is no chip with high performance or at least good performance/watt ratio on that list.
        This demonstrates that speculative execution does not necessarily have to be a security issue and that you need sophisticated speculative execution combined with OOOE for good performance.
        as i already said if you have a low core count 1-16 cores it is impossible to build a high performance cpu without speculative execution and or OOOE...
        but similar to hyperthreading as soon as you have many many many cores like 128 or 256 cores and your TDP max Watt per socket is already reached your benefit from adding any more utilisation of your calculation units is zero.

        thats the point what is important for your single or dualcore cpu or even 8core cpu is maybe NOT important to an 256core cpu.

        the "art" to build a fast singlecore cpu or fast 8core cpu is not the same "art" as building a fast 256core cpu...

        Originally posted by MadCatX View Post
        I'm talking about the present. There is no 128 core chip that I could buy for a workstation PC, neither there are programs that would be able to make good use of all of those cores. This is unlikely to change within the next ~3 years.
        you can buy a 128core ARM workstation here: https://store.avantek.co.uk/ampere-a...rkstation.html

        but if you want X86_64 you could buy a dual socket system with two 64core AMD EPYC cpus

        "I'm talking about the present."

        yes right thats the problem because if you create a company today to produce X86_64 chips you talk about future products
        a new created company can not change the past or even the present.

        "neither there are programs that would be able to make good use of all of those cores. This is unlikely to change within the next ~3 years."

        this is true but just get the point this also makes hyperthreading useless as soon as you have enough cores...

        Originally posted by MadCatX View Post
        Let's talk about the utility of HT on a 128 core chip once we can buy a 128 core chip. Until then it is pointless to speculate about it. Chips that we have right now (or will have in the forseeable future) won't have that many cores and therefore will benefit from HT.
        right thats fine and ok you can buy it in 2022...

        Zen 4c core in the 128-core Bergamo


        Originally posted by MadCatX View Post
        This has more to do with the fact that very few applications scale well beyond 64 threads. A machine that hosts hundreds of VMs that all need CPU time will give you a different picture.
        right we already see this in reality of IBM power9 and Power10... their server VMs use so many threads that they add 4threads or 8threads hyperthreading per core.

        but you have to admit this is a server only factor we will see nothing like this on the desktop or workstation.

        Originally posted by MadCatX View Post
        I'm having trouble tracking this stream of thoughts. Current x86 chips do not run into power budget problems when they use HT.
        yes right current x86 cpu chips "do not run into power budget problems when they use HT" ... right.
        but they are only 8-16core on the desktop and only 64core on workstation or server
        and they do not use TCI to stack die chips...
        but just see this chip: ampere Altra Max M128 it max out the 250watt from the socket without any problems...
        thats the point if you add hyperthreading to this 128core and you stay at 250watt per socket you will get zero performance benefit.
        but as soon as you make a 64-128core cpu with stacked chips and complete max out the power budget of your socket and cooling system then hyperthreading has no benefit anymore.

        Originally posted by MadCatX View Post
        TCI only works on very short distances. It can be used to propagate data through the CPU but RAM is physically too far away for TCI to be useful.
        the performance come from the point that you can stack L3 cache (RAM) directly to the cpu die chip...
        it is not about the DDR4/5 ram you can add to your computer it is about the L3 cache they stack on the cpu die.

        Where did you get these figures from?

        Originally posted by MadCatX View Post
        What makes you think that?
        This all sounds like wild speculation on your part about what future chips might look like. Judging by the current Intel or AMD roadmaps, we won'd get anything like this at least until 2025. When we get such chips, it'll be time to reevaluate what technologies and design approaches are sensible. Until then we should stick with what is applicable to the chips we have now.
        128core chips and more
        well on X84_64 this is future chips on ARM you have it right now (ampere Altra Max M128=5419€)

        and about roadmaps intel want to do 128cores at 2025 but amd want to do it in 2022

        Zen 4c core in the 128-core Bergamo

        maybe you mean on desktop ? yes maybe aside from the server we need to wait to 2025 for the desktop to go there.


        Originally posted by MadCatX View Post
        This sounds like another baseless speculation. Current CPUs perform best when they *can* keep the pipeline filled as much as possible.
        it has an effect if you can cool the extra heat and if your socket can provite the extra electric energy.
        if your heat is already at the max and your socket electric energy is at the max then it has zero effect on the performance.
        and if you see the 128core ARM cpu it max out the 250watt of the socket and there is zero room for any other heat.

        Originally posted by MadCatX View Post
        This is an odd kind of argument. It's pretty damn obvious that an entire extra core will offer bigger performance boost than processing optimization like HT. The issue is that an extra core takes up much more space on the die and uses much more power than the circuitry used for HT management. A true 32 core CPU is, therefore, more expensive to manufacture and cool than a 16 core HT CPU. And that is the entire reason why HT is not a bad idea.
        right its an odd kind of argument because the 37% of performance increase of hyperthreading comes at only 5% increase of the tranistors and the 100% increase of the second core comes at a 100% increase of the tranistors (even if you deduct the 5% of hyperrgreading you end up as 105% vs 190%)
        and this makes it sound like "And that is the entire reason why HT is not a bad idea." right...
        what you don't get is that this is all arguments of the past in the time of single core cpu or dual core cpu or 4core cpu or 8 core cpu or 12core cpu or maybe even 16core cpu....

        you can today buy 128core ARM in a workstation so this is plain and simple wrong (There is no 128 core chip that I could buy for a workstation PC) https://store.avantek.co.uk/ampere-a...rkstation.html
        it is fact you can buy a 128core ampere Altra Max M128 workstation
        and if you compare it to a AMD threadripper 64core system with hyperthreading you will see that hyperthreading is useless.
        the point is as soon as you have enough cores hyperthreading becomes useless.
        your theory is: the 64core+HT is cheaper than the 128core.


        ampere Altra Max M128=5419€

        ✔ Preisvergleich für AMD Ryzen Threadripper PRO 3995WX, 64C/128T, 2.70-4.20GHz, boxed ohne Kühler ✔ Produktinfo ⇒ Kerne: 64 • Threads: 128 • Turbotakt: 4.20GHz • Basistakt: 2.70GHz… ✔ AMD ✔ Testberichte ✔ Günstig kaufen

        AMD Ryzen Threadripper PRO 3995WX
        € 5296,62

        as you can see the 64core+HT is 123€ cheaper... thats like nothing... but thats for AMD if you compare it to intel:

        ✔ Preisvergleich für Intel Xeon Platinum 8380 ✔ Produktinfo ⇒ Kerne: 40 • Threads: 80 • Turbotakt: 3.40GHz (Turbo Boost 2.0) • Basistakt: 2.30GHz… ✔ Intel ✔ Testberichte ✔ Günstig kaufen

        40core is 8599€

        here is the performance of this 128core chip: https://www.phoronix.com/scan.php?pa...nchmarks&num=3

        i dont know how you interpret this but in my point of view at 64/128cores hyperthreading does not show the same effect as if you compare it to an 4core or 8core cpu.

        and just in case you miss the price of this amd epyc 64core...


        AMD Epyc 7763
        € 8728

        as you can see as soon as you do not buy threadripper and you buy Epyc instead the ampere Altra Max M128 is much cheaper.


        Originally posted by MadCatX View Post
        Unless your workload consists of compressing and decompressing large 7zip archives, you probably don't care that 7zip might perform suboptimally because of memory constraints. If you edit audio or video, render 3D graphics, write code, run some scientific simulations etc., HT will give you a nice boost even without ridiculous RAM sizes.
        well i did pick 7zip because it has the highest performance increase per core for hyperthreading it is 37%
        do you know another workload with an even higher benefit ?
        thats the standard argument for hyperthreading they say: you spend 5% more tranistors on the chip but you get 37% more performance.
        by the way in all this cases "edit audio or video, render 3D graphics, write code, run some scientific simulations etc" hyperthreading increase the amount of ram used for the same task. (but most people dont care and many tasks are not like 7zip for sure.)

        but thats all are arguments of the past means the time in the past of 1core cpu 2 core cpu 4 core cpu 8 core cpu and maybe 16core cpu... as soon as you have 32core or 64core or 128core cpus hyperthreading is no longer usefull.


        Phantom circuit Sequence Reducer Dyslexia

        Comment


        • @Michael: Is there a reason there is no setting which allows to compare efficient cores directly with performance cores? E.g. 8E vs 8P?

          Comment

          Working...
          X