Announcement

Collapse
No announcement yet.

Intel i9-12900K Alder Lake Linux Performance In Different P/E Core Configurations

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • qarium
    replied
    Originally posted by MadCatX View Post
    This list pretty much proves my point. To highlight the most important points:
    - Itanium aside, none of the listed chips were designed with performance in mind. Low power consumption was the key.
    - Some of the listed chips actually use speculative execution but it's implemented in a way that doesn't leak information.
    - The first generation of Atoms that is listed had both branch prediction and HT, even though in a simplified form (https://en.wikichip.org/wiki/intel/m...ctures/bonnell)
    - There is no chip with high performance or at least good performance/watt ratio on that list.
    This demonstrates that speculative execution does not necessarily have to be a security issue and that you need sophisticated speculative execution combined with OOOE for good performance.
    as i already said if you have a low core count 1-16 cores it is impossible to build a high performance cpu without speculative execution and or OOOE...
    but similar to hyperthreading as soon as you have many many many cores like 128 or 256 cores and your TDP max Watt per socket is already reached your benefit from adding any more utilisation of your calculation units is zero.

    thats the point what is important for your single or dualcore cpu or even 8core cpu is maybe NOT important to an 256core cpu.

    the "art" to build a fast singlecore cpu or fast 8core cpu is not the same "art" as building a fast 256core cpu...

    Originally posted by MadCatX View Post
    I'm talking about the present. There is no 128 core chip that I could buy for a workstation PC, neither there are programs that would be able to make good use of all of those cores. This is unlikely to change within the next ~3 years.
    you can buy a 128core ARM workstation here: https://store.avantek.co.uk/ampere-a...rkstation.html

    but if you want X86_64 you could buy a dual socket system with two 64core AMD EPYC cpus

    "I'm talking about the present."

    yes right thats the problem because if you create a company today to produce X86_64 chips you talk about future products
    a new created company can not change the past or even the present.

    "neither there are programs that would be able to make good use of all of those cores. This is unlikely to change within the next ~3 years."

    this is true but just get the point this also makes hyperthreading useless as soon as you have enough cores...

    Originally posted by MadCatX View Post
    Let's talk about the utility of HT on a 128 core chip once we can buy a 128 core chip. Until then it is pointless to speculate about it. Chips that we have right now (or will have in the forseeable future) won't have that many cores and therefore will benefit from HT.
    right thats fine and ok you can buy it in 2022...
    https://www.anandtech.com/show/17055...-and-128-cores
    Zen 4c core in the 128-core Bergamo


    Originally posted by MadCatX View Post
    This has more to do with the fact that very few applications scale well beyond 64 threads. A machine that hosts hundreds of VMs that all need CPU time will give you a different picture.
    right we already see this in reality of IBM power9 and Power10... their server VMs use so many threads that they add 4threads or 8threads hyperthreading per core.

    but you have to admit this is a server only factor we will see nothing like this on the desktop or workstation.

    Originally posted by MadCatX View Post
    I'm having trouble tracking this stream of thoughts. Current x86 chips do not run into power budget problems when they use HT.
    yes right current x86 cpu chips "do not run into power budget problems when they use HT" ... right.
    but they are only 8-16core on the desktop and only 64core on workstation or server
    and they do not use TCI to stack die chips...
    but just see this chip: ampere Altra Max M128 it max out the 250watt from the socket without any problems...
    thats the point if you add hyperthreading to this 128core and you stay at 250watt per socket you will get zero performance benefit.
    but as soon as you make a 64-128core cpu with stacked chips and complete max out the power budget of your socket and cooling system then hyperthreading has no benefit anymore.

    Originally posted by MadCatX View Post
    TCI only works on very short distances. It can be used to propagate data through the CPU but RAM is physically too far away for TCI to be useful.
    the performance come from the point that you can stack L3 cache (RAM) directly to the cpu die chip...
    it is not about the DDR4/5 ram you can add to your computer it is about the L3 cache they stack on the cpu die.

    Where did you get these figures from?

    Originally posted by MadCatX View Post
    What makes you think that?
    This all sounds like wild speculation on your part about what future chips might look like. Judging by the current Intel or AMD roadmaps, we won'd get anything like this at least until 2025. When we get such chips, it'll be time to reevaluate what technologies and design approaches are sensible. Until then we should stick with what is applicable to the chips we have now.
    128core chips and more
    well on X84_64 this is future chips on ARM you have it right now (ampere Altra Max M128=5419€)

    and about roadmaps intel want to do 128cores at 2025 but amd want to do it in 2022
    https://www.anandtech.com/show/17055...-and-128-cores
    Zen 4c core in the 128-core Bergamo

    maybe you mean on desktop ? yes maybe aside from the server we need to wait to 2025 for the desktop to go there.


    Originally posted by MadCatX View Post
    This sounds like another baseless speculation. Current CPUs perform best when they *can* keep the pipeline filled as much as possible.
    it has an effect if you can cool the extra heat and if your socket can provite the extra electric energy.
    if your heat is already at the max and your socket electric energy is at the max then it has zero effect on the performance.
    and if you see the 128core ARM cpu it max out the 250watt of the socket and there is zero room for any other heat.

    Originally posted by MadCatX View Post
    This is an odd kind of argument. It's pretty damn obvious that an entire extra core will offer bigger performance boost than processing optimization like HT. The issue is that an extra core takes up much more space on the die and uses much more power than the circuitry used for HT management. A true 32 core CPU is, therefore, more expensive to manufacture and cool than a 16 core HT CPU. And that is the entire reason why HT is not a bad idea.
    right its an odd kind of argument because the 37% of performance increase of hyperthreading comes at only 5% increase of the tranistors and the 100% increase of the second core comes at a 100% increase of the tranistors (even if you deduct the 5% of hyperrgreading you end up as 105% vs 190%)
    and this makes it sound like "And that is the entire reason why HT is not a bad idea." right...
    what you don't get is that this is all arguments of the past in the time of single core cpu or dual core cpu or 4core cpu or 8 core cpu or 12core cpu or maybe even 16core cpu....

    you can today buy 128core ARM in a workstation so this is plain and simple wrong (There is no 128 core chip that I could buy for a workstation PC) https://store.avantek.co.uk/ampere-a...rkstation.html
    it is fact you can buy a 128core ampere Altra Max M128 workstation
    and if you compare it to a AMD threadripper 64core system with hyperthreading you will see that hyperthreading is useless.
    the point is as soon as you have enough cores hyperthreading becomes useless.
    your theory is: the 64core+HT is cheaper than the 128core.

    https://store.avantek.co.uk/ampere-a...rkstation.html
    ampere Altra Max M128=5419€

    https://geizhals.de/amd-ryzen-thread...loc=at&hloc=de
    AMD Ryzen Threadripper PRO 3995WX
    € 5296,62

    as you can see the 64core+HT is 123€ cheaper... thats like nothing... but thats for AMD if you compare it to intel:

    https://geizhals.de/intel-xeon-plati...loc=at&hloc=de
    40core is 8599€

    here is the performance of this 128core chip: https://www.phoronix.com/scan.php?pa...nchmarks&num=3

    i dont know how you interpret this but in my point of view at 64/128cores hyperthreading does not show the same effect as if you compare it to an 4core or 8core cpu.

    and just in case you miss the price of this amd epyc 64core...

    https://geizhals.de/amd-epyc-7763-10...-a2491457.html
    AMD Epyc 7763
    € 8728

    as you can see as soon as you do not buy threadripper and you buy Epyc instead the ampere Altra Max M128 is much cheaper.


    Originally posted by MadCatX View Post
    Unless your workload consists of compressing and decompressing large 7zip archives, you probably don't care that 7zip might perform suboptimally because of memory constraints. If you edit audio or video, render 3D graphics, write code, run some scientific simulations etc., HT will give you a nice boost even without ridiculous RAM sizes.
    well i did pick 7zip because it has the highest performance increase per core for hyperthreading it is 37%
    do you know another workload with an even higher benefit ?
    thats the standard argument for hyperthreading they say: you spend 5% more tranistors on the chip but you get 37% more performance.
    by the way in all this cases "edit audio or video, render 3D graphics, write code, run some scientific simulations etc" hyperthreading increase the amount of ram used for the same task. (but most people dont care and many tasks are not like 7zip for sure.)

    but thats all are arguments of the past means the time in the past of 1core cpu 2 core cpu 4 core cpu 8 core cpu and maybe 16core cpu... as soon as you have 32core or 64core or 128core cpus hyperthreading is no longer usefull.


    Leave a comment:


  • MadCatX
    replied
    Originally posted by qarium View Post

    https://forum.level1techs.com/t/list...spectre/123128

    modern cpus without spectre :

    "

    ARM Cortex-A5 127 [Wiki 52]
    AMD PSP
    ARM Cortex-A7 MPCore 162 (RasPi 2)
    ARM Cortex-A53 MPCore 247 - - - in-order dual issue, with a branch predictor, according to ARM is not affected 269
    Includes Raspberry Pi 3 and many Android Phones. Example: Snapdragon 625 etc.
    [Early Intel Atoms including S/D/N Series] ((https://en.wikipedia.org/wiki/List_o...onnell_microar chitecture 249)
    Diamondville 217
    Pineview 216
    Cedar View
    VIA C7 132 - but does have a basic branch prediction scheme
    Intel Itanium 482 aka IA64 259 (This architecture is amazing and bizarre altogether)
    RISC-V 162
    RISC-V Rocket

    "

    there is a list of old cpus to but you can read it if you open the links.
    This list pretty much proves my point. To highlight the most important points:
    - Itanium aside, none of the listed chips were designed with performance in mind. Low power consumption was the key.
    - Some of the listed chips actually use speculative execution but it's implemented in a way that doesn't leak information.
    - The first generation of Atoms that is listed had both branch prediction and HT, even though in a simplified form (https://en.wikichip.org/wiki/intel/m...ctures/bonnell)
    - There is no chip with high performance or at least good performance/watt ratio on that list.

    This demonstrates that speculative execution does not necessarily have to be a security issue and that you need sophisticated speculative execution combined with OOOE for good performance.

    Originally posted by qarium View Post
    "A 128 core chip for servers is not a good example of a product for the average range or customers."

    well do we talk about the past ? or do we talk about the future ?
    I'm talking about the present. There is no 128 core chip that I could buy for a workstation PC, neither there are programs that would be able to make good use of all of those cores. This is unlikely to change within the next ~3 years.

    Originally posted by qarium View Post
    we talk about the future... and i am 100% sure we will have 32/64/128 core chips for the customers
    on laptop and pc and workstation and so one.

    and still it is a fact that the more cores you have the more useless hyperthreading becomes.
    Let's talk about the utility of HT on a 128 core chip once we can buy a 128 core chip. Until then it is pointless to speculate about it. Chips that we have right now (or will have in the forseeable future) won't have that many cores and therefore will benefit from HT.

    Originally posted by qarium View Post
    you can see this in phoronix.com benchmarks if michael tests 128 threads systems and if you test 64cores with or without hyperthreading it makes no difference because the overhead of hyperthreading at 64 cores is so high that you do not get any benefit.
    now you think only new systems have so much threads... no my 4 year old TR4 platform can hit this already with an 2990WX...

    in my point of view the end of hyperthreading is near as soon as the consumers get 32/64/128 core systems the useage of hyperthreading will be pointless.
    This has more to do with the fact that very few applications scale well beyond 64 threads. A machine that hosts hundreds of VMs that all need CPU time will give you a different picture.

    Originally posted by qarium View Post
    "The issue is that you often need a HT-like technology *in order to* get all of the performance out of your CPU."

    if you think about it only in the logic of hyperthreading then yes this is true...
    it is not true if you consider the fact that there are more dark silicon than you do "imagine"
    your dark silicon you talk about is only caused by information theory and logic.

    but there are other dark silicon according to wikipedia for example you can not transport the head away from a single point because of this you have to turn off the tranistors on the silicon.... what results in dark silicon.
    the other problem is there is no heat problem but an electric problem your electric wire can not transport the needed electrons.
    this also result in the fact that you have to turn of the tranistors on the silicon and this also result in dark silicon.

    in your mind only your information theory and your logic problem count in the matter of dark silicon.

    what you dont get is that you can design a cpu who runs so "hot" that any hyperthreading is useless because it would add more heat and you can not transport this heat away. also the electric wires are not able to transport more electrons means hyperthreading is useless because any use of it would be resulting in the flow of more electrons but the wires are already at the maximum.
    I'm having trouble tracking this stream of thoughts. Current x86 chips do not run into power budget problems when they use HT.

    Originally posted by qarium View Post
    you claim it is impossible to design such a cpu... i say it is possible and in the past one fact did make it impossible because you could not transport the information fast enough from ram into the cpu...
    TCI only works on very short distances. It can be used to propagate data through the CPU but RAM is physically too far away for TCI to be useful.

    Originally posted by qarium View Post
    but with technology like this: "ThruChip Interface (TCI) is a high-performance wireless vertical interconnect technology used to transmit signals between multiple stacked dies."
    you can transport the needed information so fast into the cpu that you can utilize so many cores to hit the max heat very fast.

    with that technologie you could put in 4 chiplets of each 16core cpu dies plus 16-32gb of ram all into the CPU package...
    resulting in 64cores with 32gb L3 cache...
    Where did you get these figures from?

    Originally posted by qarium View Post
    i am sure even if you do not do hyperthreading or this spectre speculation this cpu would have great performance.

    maybe not if you count single thread performance but for massive multicore workloads it would be a hit.
    What makes you think that?

    This all sounds like wild speculation on your part about what future chips might look like. Judging by the current Intel or AMD roadmaps, we won'd get anything like this at least until 2025. When we get such chips, it'll be time to reevaluate what technologies and design approaches are sensible. Until then we should stick with what is applicable to the chips we have now.

    Originally posted by qarium View Post
    "Without speculative execution you'd really struggle to keep the execution pipeline filled so your CPU would waste a lot of cycles just waiting for data to work with."

    this only comes into effect if you are able to cool the extra heat in one area of the cpu if you are not able to cool the heat it is pointless to add hyperthreading then. and if you go with TCI stacked dies the possibility of being able to transport all the heat away is zero... again: ZERO
    This sounds like another baseless speculation. Current CPUs perform best when they *can* keep the pipeline filled as much as possible.

    Originally posted by qarium View Post
    right. you need more ram if you double the core count similar to what happens if you add hyperthreading but there is an important different:

    if you go from 1 core to 1core+HT you get up to 37% more performance in 7zip.

    if you go from 1core to 2 cores you get up to 100% more performance in 7zip...
    This is an odd kind of argument. It's pretty damn obvious that an entire extra core will offer bigger performance boost than processing optimization like HT. The issue is that an extra core takes up much more space on the die and uses much more power than the circuitry used for HT management. A true 32 core CPU is, therefore, more expensive to manufacture and cool than a 16 core HT CPU. And that is the entire reason why HT is not a bad idea.

    Originally posted by qarium View Post
    this is true but if you do plan a system you can not count on this instead you messure the apps who need the most ram.
    Unless your workload consists of compressing and decompressing large 7zip archives, you probably don't care that 7zip might perform suboptimally because of memory constraints. If you edit audio or video, render 3D graphics, write code, run some scientific simulations etc., HT will give you a nice boost even without ridiculous RAM sizes.

    Leave a comment:


  • qarium
    replied
    Originally posted by MadCatX View Post
    I'm not sure I follow your argument here. You're using 7zip as an extreme example or how memory requirements grow with the number of threads. But this has nothing to do with HT; you'd have the same problem if you doubled the actual core count. Also, there are lots of workloads where memory demands do not grow that much with more threads.
    right. you need more ram if you double the core count similar to what happens if you add hyperthreading but there is an important different:

    if you go from 1 core to 1core+HT you get up to 37% more performance in 7zip.

    if you go from 1core to 2 cores you get up to 100% more performance in 7zip...

    as you can see both need more ram but the second option gives you more performance per ram you add.

    first option you need to double your ram but you only get 37% more performance...

    second option you double your ram bur you get 100% more performance.

    "Also, there are lots of workloads where memory demands do not grow that much with more threads."

    this is true but if you do plan a system you can not count on this instead you messure the apps who need the most ram.

    Leave a comment:


  • qarium
    replied
    Originally posted by MadCatX View Post
    No they have not. Spectre kind of attacks were not known until a few years ago and their actual impact on security and performance was not *that* bad.
    You seem to be stuck with your belief that a chip without HT would be a better value but it wouldn't. The manufacturing cost would be only marginally lower and the performance would not be better.
    I'm not sure I follow your argument here. You're using 7zip as an extreme example or how memory requirements grow with the number of threads. But this has nothing to do with HT; you'd have the same problem if you doubled the actual core count. Also, there are lots of workloads where memory demands do not grow that much with more threads.
    Which chip specifically are you talking about?
    As I've said before, every CPU architecture with competitive performance has been using speculative execution for over 30 years now. Even IBM 3070 from 1950s did that. Apart from some truly revolutionary idea in computer design there is nothing a CPU manfacturer could do to make a competitive chip that wouldn't branch predict. Your implication that all CPUs with speculative execution are vulnerable to Spectre is false, you just need to pay more attention to proper cleanup after a branch misprediction. Current CPUs don't do that because Spectre attacks weren't known at the time they were designed.
    You trust Wiki a bit too much but regardless of what we call it, HT can utilize parts of the CPU that would be unusee otherwise
    Non sequitur. On x86 you run into power budget limits with AVX and especially AVX512. This has nothing to do with HT.
    The issue is that you often need a HT-like technology *in order to* get all of the performance out of your CPU.
    So? A 128 core chip for servers is not a good example of a product for the average range or customers.
    Do you have any engineering data to support this?
    I have made no such claim.
    That is unlikely to work out well. Without speculative execution you'd really struggle to keep the execution pipeline filled so your CPU would waste a lot of cycles just waiting for data to work with. This is a more fundamental problem of information theory which you cannot solve by stacking the transistors on the die in a neater way. When you don't have ST performance, you can't really make up for it by having more cores.
    https://forum.level1techs.com/t/list...spectre/123128

    modern cpus without spectre :

    ""

    there is a list of old cpus to but you can read it if you open the links.

    "A 128 core chip for servers is not a good example of a product for the average range or customers."

    well do we talk about the past ? or do we talk about the future ?

    we talk about the future... and i am 100% sure we will have 32/64/128 core chips for the customers
    on laptop and pc and workstation and so one.

    and still it is a fact that the more cores you have the more useless hyperthreading becomes.

    you can see this in phoronix.com benchmarks if michael tests 128 threads systems and if you test 64cores with or without hyperthreading it makes no difference because the overhead of hyperthreading at 64 cores is so high that you do not get any benefit.
    now you think only new systems have so much threads... no my 4 year old TR4 platform can hit this already with an 2990WX...

    in my point of view the end of hyperthreading is near as soon as the consumers get 32/64/128 core systems the useage of hyperthreading will be pointless.

    "The issue is that you often need a HT-like technology *in order to* get all of the performance out of your CPU."

    if you think about it only in the logic of hyperthreading then yes this is true...
    it is not true if you consider the fact that there are more dark silicon than you do "imagine"
    your dark silicon you talk about is only caused by information theory and logic.

    but there are other dark silicon according to wikipedia for example you can not transport the head away from a single point because of this you have to turn off the tranistors on the silicon.... what results in dark silicon.
    the other problem is there is no heat problem but an electric problem your electric wire can not transport the needed electrons.
    this also result in the fact that you have to turn of the tranistors on the silicon and this also result in dark silicon.

    in your mind only your information theory and your logic problem count in the matter of dark silicon.

    what you dont get is that you can design a cpu who runs so "hot" that any hyperthreading is useless because it would add more heat and you can not transport this heat away. also the electric wires are not able to transport more electrons means hyperthreading is useless because any use of it would be resulting in the flow of more electrons but the wires are already at the maximum.

    you claim it is impossible to design such a cpu... i say it is possible and in the past one fact did make it impossible because you could not transport the information fast enough from ram into the cpu...

    but with technology like this: "ThruChip Interface (TCI) is a high-performance wireless vertical interconnect technology used to transmit signals between multiple stacked dies."
    you can transport the needed information so fast into the cpu that you can utilize so many cores to hit the max heat very fast.

    with that technologie you could put in 4 chiplets of each 16core cpu dies plus 16-32gb of ram all into the CPU package...
    resulting in 64cores with 32gb L3 cache...

    i am sure even if you do not do hyperthreading or this spectre speculation this cpu would have great performance.

    maybe not if you count single thread performance but for massive multicore workloads it would be a hit.

    "Without speculative execution you'd really struggle to keep the execution pipeline filled so your CPU would waste a lot of cycles just waiting for data to work with."

    this only comes into effect if you are able to cool the extra heat in one area of the cpu if you are not able to cool the heat it is pointless to add hyperthreading then. and if you go with TCI stacked dies the possibility of being able to transport all the heat away is zero... again: ZERO
    Last edited by qarium; 25 December 2021, 05:08 PM.

    Leave a comment:


  • MadCatX
    replied
    Originally posted by qarium View Post

    well the question is is it the goal to have the maximum competive performance ?
    AMD/INTEL already sacrifice all security and all other factors to get the maximum competive performance.
    No they have not. Spectre kind of attacks were not known until a few years ago and their actual impact on security and performance was not *that* bad.

    Originally posted by qarium View Post
    what speaks for a 3. cpu manufacturer you can extract direct from your writing...

    just let see your writing: "This has more to do with the fact that very few programs actually scale well beyond 64 threads. When that's the case, HT cannot do much about it."
    this is what i told you in the world of a single core cpu HT was very relevant in the world of dualcore cpu Hyperthreading was very relevant.... even for 4 core cpu HT was very relevant a 8 core cpu HT was very relevant.
    get the point ? at 12 core cpu the pro and contras are equal and at 16core cpu HT becomes highly questional but still some people want it now lets think about 32 or 64 core cpu... as you said "very few programs actually scale well beyond 64 threads"
    if you go for a massive multicore design like the 128core ARM server cpu the Hyperthreading technologie has no value for you.
    You seem to be stuck with your belief that a chip without HT would be a better value but it wouldn't. The manufacturing cost would be only marginally lower and the performance would not be better.

    Originally posted by qarium View Post
    then the point about memory "costs" you say: "Sure, more worker threads require more memory but that's hardly a surprise and it applies to every multithreaded code."
    well for me thats a good selling point if you make a 16-64core cpu without hyperthreading you can save a lot of money in RAM because you just need much less ram. just imagine this: 128core ARM server cpu 256gb RAM is fine for 128cores but if you add hyperthreading you need 512gb ram... sure most people believe more threads and more ram is better but if you only calculate logically as performance per dollar this of more threads and more ram becomes a stupid idea.
    I'm not sure I follow your argument here. You're using 7zip as an extreme example or how memory requirements grow with the number of threads. But this has nothing to do with HT; you'd have the same problem if you doubled the actual core count. Also, there are lots of workloads where memory demands do not grow that much with more threads.

    Originally posted by qarium View Post
    there are also some successfull ARM cpus on the market without Spectre (speculative execution.)
    Which chip specifically are you talking about?

    Originally posted by qarium View Post
    this only does not happen in the X86_64 ISA cpu space because AMD and Intel don't do it.
    but a 3. competitor could say ok lets do a 64 core cpu without Spectre (speculative execution.) and without hyperthreading.
    As I've said before, every CPU architecture with competitive performance has been using speculative execution for over 30 years now. Even IBM 3070 from 1950s did that. Apart from some truly revolutionary idea in computer design there is nothing a CPU manfacturer could do to make a competitive chip that wouldn't branch predict. Your implication that all CPUs with speculative execution are vulnerable to Spectre is false, you just need to pay more attention to proper cleanup after a branch misprediction. Current CPUs don't do that because Spectre attacks weren't known at the time they were designed.

    Originally posted by qarium View Post
    if you read the wikipedia article about Dark Silicon it is not exactly your meaning they talk about cooling maximum and also electric maximum means the maximum amount of ampere of electrons flow in the cpu...

    and i think thats the key because in your version hyperthreading truely looks very very good…
    You trust Wiki a bit too much but regardless of what we call it, HT can utilize parts of the CPU that would be unusee otherwise

    Originally posted by qarium View Post
    but as soon as you go from 2D cpu chips to 3D stagged CPUs like HBM or this super highend exotic japanese stagged server cpu who is not even electrical conected but use magnetic conecttion between the stacks...

    this means in your version hyperthreading looks very good but as soon as you go 3D stagged cpu the cooling problem hit first and the alectrical problem of max ampere hit also first.

    some silicon will go be unused ... by logic (hyperthreading) or by electric (ampere electron flow) or by physic(heat)
    Non sequitur. On x86 you run into power budget limits with AVX and especially AVX512. This has nothing to do with HT.

    Originally posted by qarium View Post
    if you build a cpu that max out the electron flow and heat to the absolute maximum hyperthreading will have no effect or usecase at all.
    The issue is that you often need a HT-like technology *in order to* get all of the performance out of your CPU.

    Originally posted by qarium View Post
    the question is can you build a successfull cpu without hyperthreading ? and i say: yes you can...
    So? A 128 core chip for servers is not a good example of a product for the average range or customers.

    Originally posted by qarium View Post
    but then you build a 3D stagged cpu with magnetic conection between the layers and the heat and ampere problem hit you first long time before any hyperthreading comes into effect... this means hyperthreading by technology is obsolete.
    Do you have any engineering data to support this?

    Originally posted by qarium View Post
    well you claimed intel is cheaper at the same speed...
    I have made no such claim.

    Originally posted by qarium View Post
    for the 3. cpu competitor in the X86_64 space of course if you build a cpu 1-16cores you can not compete in performance if you do not use hyperthreading or "spectre" (out-of-order execution and branch prediction).
    but thats not the point at all if you do this you do 32-64 cores and also stacked cpu chips

    https://fuse.wikichip.org/news/1206/...ral-processor/
    https://en.wikichip.org/wiki/thruchip_interface

    with that technology i am 100% sure you can beat AMD and Intel... and if you make it a 64 core cpu hyperthreading has near zero value for the desktiop...
    That is unlikely to work out well. Without speculative execution you'd really struggle to keep the execution pipeline filled so your CPU would waste a lot of cycles just waiting for data to work with. This is a more fundamental problem of information theory which you cannot solve by stacking the transistors on the die in a neater way. When you don't have ST performance, you can't really make up for it by having more cores.

    Leave a comment:


  • qarium
    replied
    Originally posted by MadCatX View Post
    A 3rd competitive mfg would be amazing but how would you expect them to have competitive performance if they didn't use out-of-order execution and branch prediction?
    well the question is is it the goal to have the maximum competive performance ?
    AMD/INTEL already sacrifice all security and all other factors to get the maximum competive performance.

    what speaks for a 3. cpu manufacturer you can extract direct from your writing...

    just let see your writing: "This has more to do with the fact that very few programs actually scale well beyond 64 threads. When that's the case, HT cannot do much about it."
    this is what i told you in the world of a single core cpu HT was very relevant in the world of dualcore cpu Hyperthreading was very relevant.... even for 4 core cpu HT was very relevant a 8 core cpu HT was very relevant.
    get the point ? at 12 core cpu the pro and contras are equal and at 16core cpu HT becomes highly questional but still some people want it now lets think about 32 or 64 core cpu... as you said "very few programs actually scale well beyond 64 threads"
    if you go for a massive multicore design like the 128core ARM server cpu the Hyperthreading technologie has no value for you.

    then the point about memory "costs" you say: "Sure, more worker threads require more memory but that's hardly a surprise and it applies to every multithreaded code."
    well for me thats a good selling point if you make a 16-64core cpu without hyperthreading you can save a lot of money in RAM because you just need much less ram. just imagine this: 128core ARM server cpu 256gb RAM is fine for 128cores but if you add hyperthreading you need 512gb ram... sure most people believe more threads and more ram is better but if you only calculate logically as performance per dollar this of more threads and more ram becomes a stupid idea.

    Originally posted by MadCatX View Post
    Basically every CPU architecture that is currently on the market and doesn't suck uses that. x86, ARM, POWER, even MIPS (in a way) use some form of speculative execution. Why? Because it seems to be the only way how to utilize all of the CPUs processing power efficiently. One notable example of an architecture that tried to do things differently was Itanium… and it failed horribly as a result.
    there are some very successfull high performance ARM cpus without hyperthreading (128core ARM server cpu)
    there are also some successfull ARM cpus on the market without Spectre (speculative execution.)

    this only does not happen in the X86_64 ISA cpu space because AMD and Intel don't do it.
    but a 3. competitor could say ok lets do a 64 core cpu without Spectre (speculative execution.) and without hyperthreading.

    Originally posted by MadCatX View Post
    I'm right with you there, if only Talos wasn't such a terrible value of the money.
    right but just count the performance per dollar if you have 4T hyperthreading per core you need a lot of RAM what makes your performance per dollar very expensive... such a power9 or power10 system is just not make to be cheap in the meaning of "performance per dollar"

    power9 is also 14nm thats means it is obsolete compared to 5nm or 7nm or 10nm cpus...

    power10 on 7nm is 3 times faster per socket than a power9 system. if talos build a single socket power10 mainboard it is faster than the dualsocket power9 mainboard system.

    everything what is not x86_64 or ARM thend to be very expensive in the market.

    Originally posted by MadCatX View Post
    Sure. All high performance CPU architectures for the last 30 years have been what's called superscalar. A superscalar CPU has some of it's execution circuitry multiplied. This means that they can process multiple instructions in one cycle. This concept is called Instruction Level Parallelism. However, because lots of algorithms are very sequential in nature, there is only so much ILP that you can effectively extract from a given stream of instructions. Here is where HT comes to play. Instead of running one program at a time on one core, you run two (or four in case of POWER9 I think). When one program cannot use all of the available execution blocks, the other program can use the rest. Those blocks that would have otherwise gone unused are sometimes referred to as Dark Silicon. This is also why HT doesn't work for all kinds of workloads. If you have a program that can make good use of the entire CPU, HT is counterproductive. In reality it's a bit more complicated but you get the idea.
    if you read the wikipedia article about Dark Silicon it is not exactly your meaning they talk about cooling maximum and also electric maximum means the maximum amount of ampere of electrons flow in the cpu...

    and i think thats the key because in your version hyperthreading truely looks very very good...

    but as soon as you go from 2D cpu chips to 3D stagged CPUs like HBM or this super highend exotic japanese stagged server cpu who is not even electrical conected but use magnetic conecttion between the stacks...

    this means in your version hyperthreading looks very good but as soon as you go 3D stagged cpu the cooling problem hit first and the alectrical problem of max ampere hit also first.

    some silicon will go be unused ... by logic (hyperthreading) or by electric (ampere electron flow) or by physic(heat)

    if you build a cpu that max out the electron flow and heat to the absolute maximum hyperthreading will have no effect or usecase at all.

    this (128core ARM server cpu) is a good example it does max out all the factors of heat and ampere flow if you add hyperthreading to this it will not bring you benefit.

    Originally posted by MadCatX View Post
    What I meant to say was 8P+HT/8E. If you look at the results again, you'll see I'm right.
    the question is can you build a successfull cpu without hyperthreading ? and i say: yes you can...

    mainly because your Dark Silicon topic has some other factors than plain and pure software logic
    yes if you only count software logic your explanation of hyperthreading sounds very good...

    but then you build a 3D stagged cpu with magnetic conection between the layers and the heat and ampere problem hit you first long time before any hyperthreading comes into effect... this means hyperthreading by technology is obsolete.

    Originally posted by MadCatX View Post
    When you're building a high performance machine, you obviously must make sure that there is no bottleneck in your specs. Every sane program has a software toggle that let's you limit the number of thread should you need to.
    right now i build systems like this: games are made for playstation 5 with 8 core and 16 threats...

    then instead of buy a AMD 5800X i buy a AMD 5950X or 2950X (i am on TR4 mainboard) and then i disable hyperthreading to save on RAM and get lower latency in the game and get higher performance in the game.

    this is the gold way right now to get the very best performance out of the games.

    so right now we talk about 16core cpu or more... it makes no sense to build a cpu with less cores.

    Originally posted by MadCatX View Post
    While correct, this is completely non sequitur to what I said. You hypothesised that a not-HT 16 core CPU would sell well. I provided an example of a real 16 core chip and argued that if you stripped it off HT, it wouldn't have been a much better value. Pitting it against 12900K price-wise is rather unfair because ADL has the "new shiny thing" tax on it but Intel doesn't have anything else that'd match the 5950X performance-wise.
    well you claimed intel is cheaper at the same speed... and i proofed to you this is not the case AMD is cheaper "performance per dollar"
    call it unfair... but this is reality if you get 1400€ to buy a computer you have this 2 options and for more professional people the AMD option is very good... for example the AMD system can handle ECC ram and the intel version can not.
    also professionals for example to compile in Gentoo like massive multicore performance and again AMD is the better option.

    in my point of view the 5950X also beats 12900K in games if you disable hyperthreading because the games are made for playstation 5 with 16 threads and the 5950X gives you 16cores if you disable hyperthreading what fits perfect to the 16 threads.
    disable hyperthreading givey you 5-6% higher performance and better latency for mouse and keyboard input.

    for the 3. cpu competitor in the X86_64 space of course if you build a cpu 1-16cores you can not compete in performance if you do not use hyperthreading or "spectre" (out-of-order execution and branch prediction).
    but thats not the point at all if you do this you do 32-64 cores and also stacked cpu chips

    https://fuse.wikichip.org/news/1206/...ral-processor/
    https://en.wikichip.org/wiki/thruchip_interface

    with that technology i am 100% sure you can beat AMD and Intel... and if you make it a 64 core cpu hyperthreading has near zero value for the desktiop...

    Leave a comment:


  • MadCatX
    replied
    Originally posted by qarium View Post

    yes exactly this is the case... we need a "third competitive manufacturer of x86 chips"

    booth intel/AMD put in:
    Spectre / speculative execution
    Intel ME/ AMD PSP https://arstechnica.com/gadgets/2021...us-master-key/
    Hyperthreading
    Closed source microcode and firmware

    lets say there is a third competitive manufacturer of x86 chips ...

    who fokus on non-bullshit chips means no speculative execution means no Spectre
    no ME/PSP trojan horse inside of the cpu
    no bullshit technologie like hyperthreading (what comes obsolete with many cores anyway more than 16 cores and the overhead is HT is so high that there is no benefit)
    A 3rd competitive mfg would be amazing but how would you expect them to have competitive performance if they didn't use out-of-order execution and branch prediction? Basically every CPU architecture that is currently on the market and doesn't suck uses that. x86, ARM, POWER, even MIPS (in a way) use some form of speculative execution. Why? Because it seems to be the only way how to utilize all of the CPUs processing power efficiently. One notable example of an architecture that tried to do things differently was Itanium… and it failed horribly as a result.

    Originally posted by qarium View Post
    no closed source microcode or firmware.
    massive multicore design for the desktop means 16cores and up.
    I'm right with you there, if only Talos wasn't such a terrible value of the money.

    Originally posted by qarium View Post
    dude i really tried to find out what dark silicon does mean in relation to hyperthreading...
    https://en.wikipedia.org/wiki/Dark_silicon
    in the Dark Silicon article on wikipedia there is not one single sentence about hyperthreading.
    can you explain to me what do you mean by this ?
    Sure. All high performance CPU architectures for the last 30 years have been what's called superscalar. A superscalar CPU has some of it's execution circuitry multiplied. This means that they can process multiple instructions in one cycle. This concept is called Instruction Level Parallelism. However, because lots of algorithms are very sequential in nature, there is only so much ILP that you can effectively extract from a given stream of instructions. Here is where HT comes to play. Instead of running one program at a time on one core, you run two (or four in case of POWER9 I think). When one program cannot use all of the available execution blocks, the other program can use the rest. Those blocks that would have otherwise gone unused are sometimes referred to as Dark Silicon. This is also why HT doesn't work for all kinds of workloads. If you have a program that can make good use of the entire CPU, HT is counterproductive. In reality it's a bit more complicated but you get the idea.
    [/QUOTE]

    Originally posted by qarium View Post
    well your sentense alone proof hyperthreading is a bullshit technology... "loosing only to the 8P/8E config."
    What I meant to say was 8P+HT/8E. If you look at the results again, you'll see I'm right.

    Originally posted by qarium View Post
    on AMD and ARM side you have the same effect as soon as you have many or lets say enough cores the effect of HT goes into negative because the overhead goes bigger than the benefit.
    Phoronix did test it multible times with 128 thread system testet with 128 then with 64 then with 32 threats and so one.
    as soon as you have enough cores the overhead of having hyperthreading and the added complexity in design the software to have so much threads to utilize
    in the end you do not have benefit from hyperthreading... on such a 128 threat system if you disable hyperthreading you even get higher performance.
    This has more to do with the fact that very few programs actually scale well beyond 64 threads. When that's the case, HT cannot do much about it.

    Originally posted by qarium View Post
    another part of saving signifikant money on ram is if you disable hyperthreading it is a fact that your system does need much more ram if you use hyperthreading.
    example if you have a threatripper 2990WX and 32cores and 64thread and you can choose between 256gb ram and 128gb ram and 64gb ram. if you have 64threads you really want the 256gb ram,... if you disable hyperthreading you can go with 128gb just fine. just use 7zip to check how much ram you need to generate 64threads or 32threads...
    Sure, more worker threads require more memory but that's hardly a surprise and it applies to every multithreaded code. When you're building a high performance machine, you obviously must make sure that there is no bottleneck in your specs. Every sane program has a software toggle that let's you limit the number of thread should you need to.

    Originally posted by qarium View Post
    If you want to compare the 5950X to the 12900K you can not compare cpu vs cpu...

    you have to compare CPU+Mainboard+RAM vs CPU+mainbord+ram

    if you put on DDR5 on intel and DDR4 on AMD the result is AMD wins big in performance per dollar.

    if you say you put ddr4 ram on the intel then amd is still cheaper as soon as you compare CPU+Mainboard to CPU+Mainboard.

    the 12900K is only cheaper if you do not calculate the mainboard in the bill.

    intel cpu+DDR5(64GB ram)+mainboard(size: ATX)
    https://geizhals.de/intel-core-i9-12...loc=at&hloc=de
    € 598
    https://geizhals.de/gigabyte-z690-ud...loc=at&hloc=de
    € 190,90
    https://geizhals.de/crucial-ddr5-dim...loc=at&hloc=de
    € 605,00

    =1393,90€

    AMD CPU+DDR4 (64gb 4800mhz)+mainboard
    https://geizhals.de/amd-ryzen-9-5950...loc=at&hloc=de
    € 719,00
    https://geizhals.de/asrock-b550-phan...loc=at&hloc=de
    € 95,69
    https://geizhals.de/g-skill-trident-...loc=at&hloc=de
    2x € 260,90

    =1336,49€

    as you can see your claim "5950X currently sells for about 15 % higher price than 12900K" is plain and simple wrong.

    AMD right now is 57,41€ cheaper in the real world.

    if you go with DDR4 on intel
    the mainboard is:
    https://geizhals.de/msi-pro-z690-p-d...loc=at&hloc=de
    € 181,37 (the DDR5 one is € 190,90)

    with intel on DDR4 amd is still 47,88€ cheaper in the real world.

    and i am a person who only count on multicore performance

    https://cpu.userbenchmark.com/Compar...50X/4118vs4086

    @64-core OC multi-core mixed speed the AMD 5950X is 10% faster.

    this means in the end you do not only save 57,41€ or 47,88€ you also get a 10% faster system (massive multicore workload)
    While correct, this is completely non sequitur to what I said. You hypothesised that a not-HT 16 core CPU would sell well. I provided an example of a real 16 core chip and argued that if you stripped it off HT, it wouldn't have been a much better value. Pitting it against 12900K price-wise is rather unfair because ADL has the "new shiny thing" tax on it but Intel doesn't have anything else that'd match the 5950X performance-wise.

    Leave a comment:


  • qarium
    replied
    Originally posted by MadCatX View Post
    I know I probably shouldn't be doing this but I'm kind of curious about what you'd consider to be the right way forward.
    Why? big.LITTLE by itself is not a bad idea. The way ADL currently does it is rather lackluster but that's because it's the very first x86 chip with such a design.
    First DDR3 and DDR4 chips also didn't look terribly interesting. Unlike DDR4, DDR5 can be scaled further and provide real benefits whereas DDR4 has reached the
    You believe that if there was, say, a third competitive manufacturer of x86 chips we'd have *fewer* clever technologies that improve performance? We have seen what the real lack of competition looks like during the whole Skylake -> Coffee Lake era
    yes exactly this is the case... we need a "third competitive manufacturer of x86 chips"

    booth intel/AMD put in:
    Spectre / speculative execution
    Intel ME/ AMD PSP https://arstechnica.com/gadgets/2021...us-master-key/
    Hyperthreading
    Closed source microcode and firmware

    lets say there is a third competitive manufacturer of x86 chips ...

    who fokus on non-bullshit chips means no speculative execution means no Spectre
    no ME/PSP trojan horse inside of the cpu
    no bullshit technologie like hyperthreading (what comes obsolete with many cores anyway more than 16 cores and the overhead is HT is so high that there is no benefit)
    no closed source microcode or firmware.
    massive multicore design for the desktop means 16cores and up.


    Originally posted by MadCatX View Post
    HT is a pretty clever way how to deal with the "dark silicon" problem.
    dude i really tried to find out what dark silicon does mean in relation to hyperthreading...
    https://en.wikipedia.org/wiki/Dark_silicon
    in the Dark Silicon article on wikipedia there is not one single sentence about hyperthreading.
    can you explain to me what do you mean by this ?

    Originally posted by MadCatX View Post
    Even this very article clearly shows the 8P with HT sometimes being the fastest configuration and second fastest overall, loosing only to the 8P/8E config.
    well your sentense alone proof hyperthreading is a bullshit technology... "loosing only to the 8P/8E config."

    on AMD and ARM side you have the same effect as soon as you have many or lets say enough cores the effect of HT goes into negative because the overhead goes bigger than the benefit.
    Phoronix did test it multible times with 128 thread system testet with 128 then with 64 then with 32 threats and so one.
    as soon as you have enough cores the overhead of having hyperthreading and the added complexity in design the software to have so much threads to utilize
    in the end you do not have benefit from hyperthreading... on such a 128 threat system if you disable hyperthreading you even get higher performance.
    another part of saving signifikant money on ram is if you disable hyperthreading it is a fact that your system does need much more ram if you use hyperthreading.
    example if you have a threatripper 2990WX and 32cores and 64thread and you can choose between 256gb ram and 128gb ram and 64gb ram. if you have 64threads you really want the 256gb ram,... if you disable hyperthreading you can go with 128gb just fine. just use 7zip to check how much ram you need to generate 64threads or 32threads...

    Originally posted by MadCatX View Post
    5950X currently sells for about 15 % higher price than 12900K and even if the lack of HT somehow made it 15 % cheaper, I really don't see it becoming a best selling product. Not to mention that it'd probably got creamed by the 12900K in both ST and MT workloads.
    Which are part of the whole OOO-execution and register-renaming logic circuitry that you absolutely need anyway.
    Sure, but I wouldn't count more cores and HT among them.
    If you want to compare the 5950X to the 12900K you can not compare cpu vs cpu...

    you have to compare CPU+Mainboard+RAM vs CPU+mainbord+ram

    if you put on DDR5 on intel and DDR4 on AMD the result is AMD wins big in performance per dollar.

    if you say you put ddr4 ram on the intel then amd is still cheaper as soon as you compare CPU+Mainboard to CPU+Mainboard.

    the 12900K is only cheaper if you do not calculate the mainboard in the bill.

    intel cpu+DDR5(64GB ram)+mainboard(size: ATX)
    https://geizhals.de/intel-core-i9-12...loc=at&hloc=de
    € 598
    https://geizhals.de/gigabyte-z690-ud...loc=at&hloc=de
    € 190,90
    https://geizhals.de/crucial-ddr5-dim...loc=at&hloc=de
    € 605,00

    =1393,90€

    AMD CPU+DDR4 (64gb 4800mhz)+mainboard
    https://geizhals.de/amd-ryzen-9-5950...loc=at&hloc=de
    € 719,00
    https://geizhals.de/asrock-b550-phan...loc=at&hloc=de
    € 95,69
    https://geizhals.de/g-skill-trident-...loc=at&hloc=de
    2x € 260,90

    =1336,49€

    as you can see your claim "5950X currently sells for about 15 % higher price than 12900K" is plain and simple wrong.

    AMD right now is 57,41€ cheaper in the real world.

    if you go with DDR4 on intel
    the mainboard is:
    https://geizhals.de/msi-pro-z690-p-d...loc=at&hloc=de
    € 181,37 (the DDR5 one is € 190,90)

    with intel on DDR4 amd is still 47,88€ cheaper in the real world.

    and i am a person who only count on multicore performance

    https://cpu.userbenchmark.com/Compar...50X/4118vs4086

    @64-core OC multi-core mixed speed the AMD 5950X is 10% faster.

    this means in the end you do not only save 57,41€ or 47,88€ you also get a 10% faster system (massive multicore workload)


    Leave a comment:


  • tildearrow
    replied
    Originally posted by birdie View Post

    So, no quotes, nothing to prove your accusations. Go stand by your lies. I've BL'ed you.
    What about you just blacklist the entire forum already and go? We've been waiting for you to do this since the last 5 years.

    Leave a comment:


  • MadCatX
    replied
    I know I probably shouldn't be doing this but I'm kind of curious about what you'd consider to be the right way forward.

    Originally posted by qarium View Post
    @ AMD

    AMD Please don't follow intels route...

    I would even buy 2-3 generation old hardware to avoid this "big.little" bullshit in my computers.
    Why? big.LITTLE by itself is not a bad idea. The way ADL currently does it is rather lackluster but that's because it's the very first x86 chip with such a design.

    Originally posted by qarium View Post
    I really really don't get all this bullshit technology they sell "snake oil" i don't even see a reason for DDR5 its "snake oil" you pay high price and the result is very very small.
    First DDR3 and DDR4 chips also didn't look terribly interesting. Unlike DDR4, DDR5 can be scaled further and provide real benefits whereas DDR4 has reached the
    Originally posted by qarium View Post
    many of these bullshit technologies are only around because of the lag of competition in the market,
    You believe that if there was, say, a third competitive manufacturer of x86 chips we'd have *fewer* clever technologies that improve performance? We have seen what the real lack of competition looks like during the whole Skylake -> Coffee Lake era

    Originally posted by qarium View Post
    for example """Hyperthreading""" in Desktop and Gaming its hard to fill 16 real cores and hyperthreading result in bad latency in games and the threading overhead is so big that you get 5% higher game performance if you disable hyperthreading.

    AMD and Intel and even IBM does this because their server customers like it ,,,

    on Desktop/Gaming it is snake oil you did not want. it has some reason for single-core/dualcore/quatcore cpus because you could easily generate more threats to feet it to the cpu... but with 12/16 cores it is the situation that games and other desktop apps can not fill all the 16 threats... in this cases hyperthreading is a loss.
    HT is a pretty clever way how to deal with the "dark silicon" problem. Even this very article clearly shows the 8P with HT sometimes being the fastest configuration and second fastest overall, loosing only to the 8P/8E config.

    Originally posted by qarium View Post
    its a lag of competition in the market if there would only be one single X86_64 competitor who does 16cores without hyperthreading i am sure they would sell a lot of chips.
    5950X currently sells for about 15 % higher price than 12900K and even if the lack of HT somehow made it 15 % cheaper, I really don't see it becoming a best selling product. Not to mention that it'd probably got creamed by the 12900K in both ST and MT workloads.

    Originally posted by qarium View Post
    they claim there is no market demand for cpus without hyperthreading because you can disable it in bios... o well yes you can disable it but you already paid the transistors.
    Which are part of the whole OOO-execution and register-renaming logic circuitry that you absolutely need anyway.

    Originally posted by qarium View Post
    i really don't know whats wrong with this world every company puts in stuff the people don't want...
    Sure, but I wouldn't count more cores and HT among them.

    Leave a comment:

Working...
X