Announcement

**Anux** · 02 November 2023, 04:46 AM

Originally posted by drakonas777 View Post

Basically now everything depends on how much Qualcomm is committed to PC market, because they have tech, money and business relationships to make it "the thing".

That is just one link in the chain. They also need a market segment to fit in, Windows on ARM has already failed mostly for compatibility reasons.

Their best chance is making a deal with google for their chromebooks, that is the only ARM notebook with somewhat high volume. But that depends on the price, google wants cheap systems.

**Jabberwocky** · 02 November 2023, 01:07 PM

Originally posted by uid313 View Post

Intel and AMD both have shit CPUs which are very energy inefficient so in order to increase performance they have to feed them with a lot of more power in order to yield a little bit more performance. So especially the more expensive high-end CPUs are shitty, like the Intel Core i9 and Ryzen 9 series, so it is much better to buy a Intel i5 or Ryzen 5 rather than the very power thirsty i9 or Ryzen 9.

I'm not a fan of x86 and the junk legacy baggage. Not a fan of being very dependent on patents owned by a small group. I am a fan of objectivity.

This statement is objectively wrong. Intel i9 and Ryzen 9 CPU cores has more power efficient silicon then i5 or r5 most of the time. There are niche cases and arguments about idle performance that creates various compound scenarios but if we look at your argument which is "Intel and AMD both have shit CPUs which are very energy inefficient so in order to increase performance they have to feed them with a lot of more power in order to yield a little bit more performance. So especially the more expensive high-end CPUs are shitty" this is just wrong.

Originally posted by uid313 View Post

I look forward Qualcomm launching their new Snapdragon Elite X which is going to crush both Intel and AMD. Soon Nvidia will join too with their new high performance ARM CPU and Intel and AMD will be left behind with their shitty x86 CPUs.

Intel and AMD really need to make either a ARM or RISC-V CPU. The x86 architecture is at a dead end. Intel nor AMD can make x86 CPUs that are good enough to compete with the ARM-based offerings of Apple, Qualcomm and Nvidia.

If HP and Dell want to sell something, they need to bring some ARM-based products to the market, because their products are inferior. We all hate Apple but they got the best laptops on the market. Microsoft knows it, their Surface Book is shit, it cannot compete, they need ARM. Samsung knows it, their laptops are shit, they need ARM.

Your argument was valid in the 1980s and 1990s but you really can't blame ISA for power efficiency these days. Modern out-of-order pipelines, advanced instruction decoding and op-caching has put the nail in the coffin for this argument. Software is the biggest problem today regarding efficiency. Drivers, power management and even applications. If you look at the M2 as a whole it's more CISC than x86 but operations are being done outside the core so people just ignore that. The power management of Apple the M2 as a whole is insanely good, most people mistake the combination of the OS and all the features that the SOC brings with the performance/Watt of the CPU core.

Like coder said, Intel and AMD are competing so if one decides to clock their chips to frequencies where the chips are less efficient the other will likely follow. Alder Lake went up from 125W to 150W while Zen3 was still on 105W. Now look at Zen4, surprise surprise. Poor Radeon Graphics Group had to do the same with Vega. Fortunately users can underclock and undervolt their devices to optimize power/performance ratios. Further the most efficient silicon goes into the most expensive enterprise parts (with chiplets even more so). The lowend devices usually gets the reject parts.

In super computers ARM, POWER and others are getting it's teeth kicked in by x86 paired with Nvidia and AMD GPUs. Even the green top500 is dominated by x86. The best spot for non-x86 on the the green top500 is 45th for Power (16.28 Gflop/W) and 47th for ARM (15.42 Gflop/W). Intel #1 (65.40 Gflop/W) and AMD #2 (62.68 Gflop/W). Again this isn't just because of CPU core ISAs. It has got much more to do with hardware and software compatibility combined opposed to CPU efficiency by itself.

**qarium** · 02 November 2023, 06:22 PM

Originally posted by Jabberwocky View Post

I'm not a fan of x86 and the junk legacy baggage. Not a fan of being very dependent on patents owned by a small group. I am a fan of objectivity.

This statement is objectively wrong. Intel i9 and Ryzen 9 CPU cores has more power efficient silicon then i5 or r5 most of the time. There are niche cases and arguments about idle performance that creates various compound scenarios but if we look at your argument which is "Intel and AMD both have shit CPUs which are very energy inefficient so in order to increase performance they have to feed them with a lot of more power in order to yield a little bit more performance. So especially the more expensive high-end CPUs are shitty" this is just wrong.

Your argument was valid in the 1980s and 1990s but you really can't blame ISA for power efficiency these days. Modern out-of-order pipelines, advanced instruction decoding and op-caching has put the nail in the coffin for this argument. Software is the biggest problem today regarding efficiency. Drivers, power management and even applications. If you look at the M2 as a whole it's more CISC than x86 but operations are being done outside the core so people just ignore that. The power management of Apple the M2 as a whole is insanely good, most people mistake the combination of the OS and all the features that the SOC brings with the performance/Watt of the CPU core.

Like coder said, Intel and AMD are competing so if one decides to clock their chips to frequencies where the chips are less efficient the other will likely follow. Alder Lake went up from 125W to 150W while Zen3 was still on 105W. Now look at Zen4, surprise surprise. Poor Radeon Graphics Group had to do the same with Vega. Fortunately users can underclock and undervolt their devices to optimize power/performance ratios. Further the most efficient silicon goes into the most expensive enterprise parts (with chiplets even more so). The lowend devices usually gets the reject parts.

In super computers ARM, POWER and others are getting it's teeth kicked in by x86 paired with Nvidia and AMD GPUs. Even the green top500 is dominated by x86. The best spot for non-x86 on the the green top500 is 45th for Power (16.28 Gflop/W) and 47th for ARM (15.42 Gflop/W). Intel #1 (65.40 Gflop/W) and AMD #2 (62.68 Gflop/W). Again this isn't just because of CPU core ISAs. It has got much more to do with hardware and software compatibility combined opposed to CPU efficiency by itself.

in general yes and in multicore yes yes yes...

but if you see this "Snapdragon Elite X" benchmarks it looks to me like the X86_64 architecture has a single-core performance problem ?
X86_64 implement hyperthreading to avoid single-core performance problems by feed the cores multible threats at the same time.
but it looks like the variable lenght of the x86_64 commands make it harder to develop a wider x86 core...

apple M1/M2/M3 showed this years ago and this "Snapdragon Elite X" has even better singlecore performance because the cpu is ultra wide

"Instruction decoding is a bottleneck for x86 these days: Apple M1 can do 8-wide decode, Intel just managed to reach 6-wide in Alder Lake, and AMD Zen 3 only has a 4-wide decoder. One would think that dropping legacy 16-bit and 32-bit instructions would enable simpler and more efficient instruction decoders in future x86 versions."

https://news.ycombinator.com/item?id=32473083

this is exactly the same reason why "Snapdragon Elite X" is so performant in single-core performance...

you talk about "super computers" and in this world multicore is what count but this single-core performance problem could be a problem in other areas like gaming consoles or other areas with bad multicore support.

games in generall do not perform well over 6-8 cores so it does not matter if you 64core threadripper in the supercomputer area has better multicore performance.

**sophisticles** · 03 November 2023, 11:54 AM

Originally posted by junkbustr View Post

If I can get 85% of the performance for 60% of the cost....

The funny thing is that this is how the "Wintel" ecosystem came into being.

Years ago "Big Iron", large mainframes from IBM that ran proprietary UNIX systems with expensive per seat licenses dominated, what people did was have small basically dumb terminals that were networked with the mainframes and everything ran on that big system. They were large, expensive, used lots of power and the per seat licenses were exorbitant.

Microsoft and intel decided to attack that market with the notion of smaller, less powerful, less power hungry and much cheaper systems. The pitch was that you could get half the performance, at a tenth of the power consumption and a quarter of the cost and Wintel was born.

**Jabberwocky** · 05 November 2023, 04:39 PM

Originally posted by qarium View Post

in general yes and in multicore yes yes yes...

but if you see this "Snapdragon Elite X" benchmarks it looks to me like the X86_64 architecture has a single-core performance problem ?
X86_64 implement hyperthreading to avoid single-core performance problems by feed the cores multible threats at the same time.
but it looks like the variable lenght of the x86_64 commands make it harder to develop a wider x86 core...

apple M1/M2/M3 showed this years ago and this "Snapdragon Elite X" has even better singlecore performance because the cpu is ultra wide

"Instruction decoding is a bottleneck for x86 these days: Apple M1 can do 8-wide decode, Intel just managed to reach 6-wide in Alder Lake, and AMD Zen 3 only has a 4-wide decoder. One would think that dropping legacy 16-bit and 32-bit instructions would enable simpler and more efficient instruction decoders in future x86 versions."

https://news.ycombinator.com/item?id=32473083

this is exactly the same reason why "Snapdragon Elite X" is so performant in single-core performance...

you talk about "super computers" and in this world multicore is what count but this single-core performance problem could be a problem in other areas like gaming consoles or other areas with bad multicore support.

games in generall do not perform well over 6-8 cores so it does not matter if you 64core threadripper in the supercomputer area has better multicore performance.

Thanks for making a technical argument.

I have not read about 80376 before. I wasn't aware that these were made and put into dumb terminals. Removing 16-bit and 32-bit instructions is a extremely challenging task. I would be surprised it this happened anytime soon. I sometimes compare x86 to JavaScript. We have all this legacy junk and hacks is because if either broke backwards compatibility the industry likely would have moved to something else.

I've looked at some benchmarks for Apple's M2 (8+10) 20 billion transistors TSMC 5nm N5P vs AMD 6800U (zen3+ rdna2) 13 billion transistors TSMC 6 nm FinFET in an Asus Zenbook S13. The results doesn't match the hype IMO. If the headlines read low idle and hours of battery lifetime then I would say the hype is real. Tom's Hardware claiming it's better than a Threadripper 3990X and Apple's Senior VP, Johny Srouji, claiming it's faster than Geforce 3090 is just moronic. I digress, let's look at the benchmarks...

The M2 has a better GPU hands down, but on the CPU side it's more or less the same as the 6800U. In FL Studio WAV export M2 is 12% slower than 6800U (15W), 16% slower than 6800U (25W). In Blender the M2 is only 3.2% faster 6800U (15W) and is 7.1% slower than the 6800U (25W). The M2 does well in 7zip compressing but fails miserably with 7zip decompression being 32,4% slower than 6800U (15W). Also nobody is testing AVX512 and comparing that to Apple. The M2 is also much more expensive. One thing is for certain, Intel's 1260P is struggling all round in terms of performance / Watt. These stats were obtained from Hardware Unboxed's benchmarks. I'm still waiting for proper Zen4 tests vs Apple M2. Here's a generic comparison: https://www.notebookcheck.net/R7-784....247596.0.html

I have a few questions:

What is the percentage of power that the decoder uses and how much does that objectively contribute to improved efficiency / performance?
We have seen in x86 alone that there have been big improvements to IPC when optimizing between branch prediction and micro-op cache / decode-queue. What proof do we have that we have reached a limit where we cannot optimize variable length decoding further (regardless if we keep or remove legacy instructions) ?
How can we objectively measure that ARM's success is due to the "ultra" 8-wide decode step?
Is x86 struggling to improve the decode step because of legacy instructions, perhaps it's AVX or even something else?

CPU core frontends are very complicated. The smallest change to something like cache latency. The balance between predictions and mispredictions. Every small change has a profound impact on the entire system. I enjoy studying this, but I can't honestly say I know exactly what is going on in Raptor Lake / Zen5 / M3 / Snapdragon. it's easy to make statements like "X has better single core performance because the CPU is ultra wide", but to actually prove that this architectural change is responsible for this is another story. If there's a proper study on this and we can definitively say it's too difficult to feed the execution and single core performance is over on x86, I really doubt that this is the case, however I'll be happy to admit that I'm wrong. Right now though, AFAIK Snapdragon Elite is only being released June 2024 so we won't even have independent testing of Snapdragon Elite X until then.

Regarding the announced Snapdragon Elite X single core benchmarks: Even if we trust that independent testing will give the same results over all workloads then it doesn't look like it's a life-changing difference. Certainly not enough to revive dead RISC vs CISC arguments. Nor something that spells the end of x86. One thing that is impressive is the lower frequencies of the device, but it still seems to draw a lot of power (80W) at 4.3 Ghz. Maybe this will improve in the near future?

If something would bring x86 laptops to and end it's bad power management on the OS side in both Linux and Windows. Also the lack of innovation compared to the bells and whistles of the M2 and how many applications take advantage of the SOC on a low level. Apple does a good job at marketing their new features and makes it exciting for teams (including managers) to implement. Dopamine firing off the charts every time a new hardware feature is supported. On the other side you have Windows benchmarking and video editing software that took many years to support very useful u.arch improvements. Like Adobe's trash software that would just crash on some CPUs or not use hardware encoding for many years. This is where the real battle is being fought IMO.

I hope ARM and RISC-V find more ways of improving over x86 and not in nonobjective hype tactics. Massive speculation: It seems like AMD is going after inter-CCD latency in the next 2 years. AMD will likely grow at 10 to 15% IPC with Zen5 and ~10% with Zen6 (completely new chip layout design, new infinity fabric, CCD stacking) we might see Arrow Lake and ARM growing more efficient than that over the next 2 years so it will be interesting. We will see some interesting laptops next year with AMD bringing out some new things too but we might see delays for the big stuff like Strix Halo (likely 2025) but we still should see Strix Point in 2024 due to Windows 11 AI requirements. Both AMD Strix Point and Snapdragon X Elite will go for ~40 to ~45 TOPS if the rumors are right.

I love low-power passive cooled devices, so Cortex-A5 over Cortex-A7 and Cortex-X models... never mind x86.

**coder** · 05 November 2023, 06:21 PM

Originally posted by sophisticles View Post

The funny thing is that this is how the "Wintel" ecosystem came into being.

Years ago "Big Iron", large mainframes from IBM that ran proprietary UNIX systems with expensive per seat licenses dominated, what people did was have small basically dumb terminals that were networked with the mainframes and everything ran on that big system. They were large, expensive, used lots of power and the per seat licenses were exorbitant.

Microsoft and intel decided to attack that market with the notion of smaller, less powerful, less power hungry and much cheaper systems. The pitch was that you could get half the performance, at a tenth of the power consumption and a quarter of the cost and Wintel was born.

Uh, more like IBM decided to get into the PC market and picked Microsoft as the OS vendor. When spreadsheets became a "thing", and first came out on PCs, that + the IBM nameplate drove them into the corporate workplace. From there, I think it was a fairly natural evolution of wanting to network them and Microsoft wanting to grow its marketshare.

**coder** · 05 November 2023, 06:35 PM

Originally posted by Jabberwocky View Post

Intel i9 and Ryzen 9 CPU cores has more power efficient silicon then i5 or r5 most of the time.

Depends on how many threads & what clock speeds or power budgets you run them at. Using the out-of-the-box power limits, the Intel K and AMD X CPUs run far outside their efficiency envelope. However, if you have a well-threaded workload and reduce the power limits of the CPUs to a more sane per-core number, then more cores should be more efficient than fewer cores.

Originally posted by Jabberwocky View Post

There are niche cases and arguments about idle performance that creates various compound scenarios

When looking at efficiency metrics which measure Joules per task, I pay no heed to benchmarks which measure system power, since that's affected by many things besides the CPU.

Originally posted by Jabberwocky View Post

Your argument was valid in the 1980s and 1990s but you really can't blame ISA for power efficiency these days.

Disagree. A variable-length ISA requires more work to decode. That work increases nonlinearly, as you try to decode more instructions in parallel, because the start of each instruction depends on where the last one ended.

Also, I think 86 instructions are bigger, on average, than ARM or RISC-V. That means less efficient instruction cache utilization, which also has energy-efficiency implications.

Originally posted by Jabberwocky View Post

Modern out-of-order pipelines, advanced instruction decoding and op-caching has put the nail in the coffin for this argument.

Caching micro-ops mostly helps just for loops. A lot of code isn't very loopy. ARM is moving the other direction, and has purged the mop cache from their latest cores, now that they no longer have to support AArch32. Their decoder is efficient enough that the op cache was no longer a net win, and yet it still wasted die space.

Originally posted by Jabberwocky View Post

Software is the biggest problem today regarding efficiency. Drivers, power management and even applications.

This doesn't hold water for comparing one microarchitecture against another, since we're typically looking at the same software, toolchain, and OS running on each.

Originally posted by Jabberwocky View Post

If you look at the M2 as a whole it's more CISC than x86 but operations are being done outside the core so people just ignore that.

This makes zero sense, given that CISC refers to ISA and that's only relevant to the core.

Originally posted by Jabberwocky View Post

In super computers ARM, POWER and others are getting it's teeth kicked in by x86 paired with Nvidia and AMD GPUs.

Not any more. Nvidia is dropping x86, now that they have their own ARM-based Grace CPUs.

Originally posted by Jabberwocky View Post

Even the green top500 is dominated by x86.

It's not because x86 is good - just that it still dominates the industry. The compute efficiency of those machines largely comes from the GPUs. The CPU has very little to do with it.

**coder** · 05 November 2023, 06:43 PM

Originally posted by drakonas777 View Post

However, performance is not the most important factor to crush x86. It's a value proposition. They need to ship a cheap and good enough variant of SD X in massive quantities to eat up a good portion of cheap entry level and mid-range non-gaming notebooks at the same time offering attractive premium designs. After achieving critical install base in this largest PC segment they can go up into gaming notebooks, desktops and WSs eventually if they choose to. But it will be a long journey so as I said everything depends on the commitment to PC market.

Here's where Qualcomm will shoot itself in both feet. They just can't bring themselves to price their notebook SoCs competitively. A wise move would be to accept lower margins, for a couple years, in order to establish a foothold in the corporate laptop & mini-PC market. Then, slowly build up the brand premium to the point where you can charge Macbook Air-level prices. However, Qualcomm always seems to overestimate its value proposition, resulting in absurdities like Lenovo's Thinkpad X13S launching at about $2k - and that was for a laptop with decidedly lackluster performance and less than the stellar, claimed battery life. Even if you didn't get the 5G modem, it only shaved like $300 off the price.

Qualcomm is going to find a way to mess this up, and it will probably be pricing. Believing they have a product superior to Apple's, they will do something idiotic, like trying to price above Apple. This could open the door for MediaTek to come in and eat its lunch.

**coder** · 05 November 2023, 06:53 PM

Originally posted by Classical View Post

In my opinion, ARM is actually not that much better than what AMD and Intel currently use. Compare e.g. the results of the iPhone 12 with the Intel 12600K that I published here:

Benchmarking Unity graphics performance in WebGL

https://www.techpowerup.com/forums/threads/benchmarking-unity-graphics-performance-in-webgl.311843/page-2

There are other Windoze system in the house. I have a Radeon RX 580 graphics card. I'm most certainly not planning on moving hardware components around trying to get this stupid benchmark to run. My youngest sister has an i7 8700k + RX 580 8GB in the system I built for her so I can ask her what...

As you can see, the performance of the iPhone 12 mini (as fast as a standard iPhone) is really very weak in the 'animation & skinning', 'particles' and 'AI agents' sections.

This is like a bad troll. With all the Apple M-series CPU benchmarks published on this very site, why would you refer to some graphics benchmarks run on an obsolete iPhone and Intel CPU?

Originally posted by TemplarGR View Post

You do not simply increase or decrease the clockrate by changing a switch, in order for the clockrate to be able to go higher you need to change the architecture and/or the process node. x86 architectures are able to clock high and use wider execution units per core because of their architecture, arm cores CANNOT.

The microarchitecture is what determines those things, not the instruction set architecture (ISA). A bad ISA can make it harder to build a good microarchitecture, and that's where x86 is at a disadvantage.

Originally posted by TemplarGR View Post

if ARM ever develops a desktop cpu it will be using similar levels of power.

Apple's desktops aren't pure-bred desktop CPUs, but they demonstrate what efficiency can be achieved with ARM's ISA at desktop performance levels.

Originally posted by TemplarGR View Post

So you can't say Intel's architecture is "bad".

I think we can. Its disadvantages include:

Variable-length instructions
Fewer general-purpose registers
Mostly 2-operand instructions
More stringent memory semantics

Also, it's just big & complex. That leaves room for lots of security vulnerabilities, which inevitably get patched using performance-robbing mitigations.

Originally posted by TemplarGR View Post

Also Apple cores are severely overrated/overhyped and are not really better than Intel either. And Apple also benefits by compiling the OS and Software for their architecture, unlike Intel and x86 software which is more generic.

You can compile your own software on them.

As for the OS, it's not as if AMD and Intel don't both submit plenty of kernel patches, to make their CPUs run Linux efficiently.

**coder** · 05 November 2023, 07:30 PM

Originally posted by Jabberwocky View Post

I've looked at some benchmarks for Apple's M2 (8+10) 20 billion transistors TSMC 5nm N5P vs AMD 6800U (zen3+ rdna2) 13 billion transistors TSMC 6 nm FinFET in an Asus Zenbook S13. The results doesn't match the hype IMO.

Why not instead look at benchmarks comparing AMD 7040 "Phoenix" APUs with Apple M2? Then, you have TSMC N4 being compared with TSMC 5P, yielding near performance-parity.

Here's a review of 7840 HS. You can add Apple MacBook Pro 14 2023 M2 Pro to the CPU Performance Rating graph and it shows the Mac beats Ryzen by 7% (76.3 vs. 71.8).

Originally posted by Jabberwocky View Post

I have a few questions:

What is the percentage of power that the decoder uses and how much does that objectively contribute to improved efficiency / performance?
We have seen in x86 alone that there have been big improvements to IPC when optimizing between branch prediction and micro-op cache / decode-queue. What proof do we have that we have reached a limit where we cannot optimize variable length decoding further (regardless if we keep or remove legacy instructions) ?
How can we objectively measure that ARM's success is due to the "ultra" 8-wide decode step?
Is x86 struggling to improve the decode step because of legacy instructions, perhaps it's AVX or even something else?

Only Intel and AMD actually know such detailed information about their CPUs.
I think it's probably not so much due to the size of the opcode space, as it is just the hassle of dealing with variable-length instructions, to begin with.
Again, this is something only ARM or Apple would know about their CPUs.
Same answer as point 2.

Originally posted by Jabberwocky View Post

I enjoy studying this, but I can't honestly say I know exactly what is going on in Raptor Lake / Zen5 / M3 / Snapdragon. it's easy to make statements like "X has better single core performance because the CPU is ultra wide", but to actually prove that this architectural change is responsible for this is another story. If there's a proper study on this and we can definitively say it's too difficult to feed the execution and single core performance is over on x86, I really doubt that this is the case, however I'll be happy to admit that I'm wrong.

You'll find lots of analysis and investigation of these sorts of questions on Chips & Cheese. For instance, they did a detailed performance comparison (i.e. not just benchmarks, but also analysis) on ARM Neoverse N1 vs. Zen 2, a few years ago:

https://chipsandcheese.com/2021/08/0...m-in-practice/

Just recently, they posted analysis of ARM's X2 cores, via "Snapdragon 8+ Gen 1" (note that it's a phone SoC - not one of their laptop SoCs).

Originally posted by Jabberwocky View Post

Even if we trust that independent testing will give the same results over all workloads then it doesn't look like it's a life-changing difference.

If performance is comparable and efficiency is way better, then it could definitely have implications for the laptop and server markets.

Originally posted by Jabberwocky View Post

it still seems to draw a lot of power (80W) at 4.3 Ghz.

That's actually the upper limit for 3.8 GHz on 12 cores. 4.3 GHz is the boost speed for up to 2 cores. Given that it has a higher IPC target than competing cores, it's hard to say that's a lot. The real test for efficiency isn't to look only at power, core-count, and clockspeed, but rather to look at perf/W on different workloads. IPC and clock speed are meaningful only to CPU geeks. It's really just the end result that matters.

Announcement

Intel Core i5 14600K & Intel Core i9 14900K Linux Benchmarks

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment