Announcement

**mdedetrich** · 08 December 2023, 03:10 AM

Originally posted by Dukenukemx View Post

Again, don't look at Geekbench scores. Geekbench heavily favors Apple. Nobody should go by synthetic tests anyway. That being said, the Ryzen 9 7940HS used less power than the M3 Max in Handbrake. It didn't outperform it in Handbrake, but the M3 Max didn't even outperform the M1 & M2 Ultra. I really doubt that Handbrake is enough to go by to determine the performance of an Apple M3 vs Ryzen 9 7940HS. Maybe next time they can test AV1 encoding, since the M3 doesn't have it, but AMD and Intel both do.

Im not talking about Geekbench scores, I am talking about the Phoronix scores which you referenced/brought up here https://www.phoronix.com/forums/foru...31#post1426831. Have a look at https://www.phoronix.com/review/apple-m2-zen4-mobile/3, specifically John the Ripper's score where the macbook Air is 5 times slower. That makes ZERO sense and if you think about it why its so low it makes perfect sense, John the Ripper has never targeted ARM, either on server (who uses John the Ripper on new ARM servers), desktop (M series is new) or on mobile (again who runs John the Ripper on mobile).

**drakonas777** · 08 December 2023, 03:30 AM

Originally posted by mdedetrich View Post

It actually doesn't

And more critically, it just proves my point you are not comparing Apples to Apples (pun intended). We are comparing code which has been optimized with an ISA for literal decades vs code that has maybe 2-3 years of optimization? Its only recently that ARM has became a tangible target for both desktop (Apple) and server (Ampere Altra/Graviton etc etc).

And thats the thing, for the companies using both those ARM server products and desktop (i.e. compiling programs natively on Apple sillicon using XCode) the power efficiency is noticeably (not negligibly) higher than x86/x86_64. In these cases the effort was put in to actually properly target ARM

No its not, because you are comparing RISC to CISC. Even though one can argue that this distinction is too high level, it still matters in the general sense. In any case the point is clear and you are grasping at straws here, fundamentally on the one side we are benchmarking code that has been properly optimized vs code that in some cases has and in other cases clearly not.

You can make conclusions out of that but objectively they are not scientific ones. I would be far more interested in only benchmarking codebases which have received proper optimizations on both ISA's (and thats a bare minimum, they are also questions about power/scheduling wrt Asahi), at least if you want to answer the "which ISA is more power efficient then the other one" in good faith.

Poorly optimized code will make any ISA look bad, ARM or not.

I am not against testing case where benchmarks are optimized for both architectures. The problem is it's infeasible to do at the moment, because all benchmarking suites won't get optimized over night. On the other hand recompiling test suites to rely on generic ISA is relatively easy to do. This way we could at least eliminate the use of specialized vector/matrix instructions what is the main reason for ARM being "unoptimized" in certain workloads. Yes, it's not perfect solution, but it's more fair one. Furthermore, it would benefit Apple Silicon far more than x86, since Apple's ARM implementation does not support SVE and NEON is not a equivalent of AVX2/512.

Even in optimized case you would invent another excuse for Apple's ARM, like "since it does not support SVE, it's not fair to compare it in optimized CPU-only workloads, we should use accelerators in Apple case" or some nonsense like that. I guess people have to psychologically deal with the fact M1 was released with already dated ISA and uarch was stagnating since 2020 with no IPC improvements whatsoever...

Almost all microarchitecture-specific intrinsics and hand crafted ASM optimizations are organized around specialized instructions. The idea that x86 is generally "magically optimized" in the context of generic subset while ARM is not is pure trash.

At the end of the day I'm not saying ARM is not generally more power efficient. It is (in frontend mostly). However, we are talking percentages and not integer factors. People watched that original M1 event where 5N based modern HPC ARM implementation was compared against 2015 14N Intel garbage and somehow they think, that the same efficiency gains more or less are maintained againsy modern x86. They are not. Actually, next year ZEN5 is going to be the most 'fair' x86 to compare with M1/2, because both the lithography and backend design will be the closest so far.

x86 problem is not an ISA. Problems are: a) it's lagging behind Apple Silicon in lithography/SoC design/backend design by years b) most x86 mobile products are trash and have idiotic factory power limits coupled with small low quality batteries. Yes, ARM will always be more efficient by some amount due more efficient frontend, but that's only a small fraction if we consider the whole system design.

**mdedetrich** · 08 December 2023, 03:51 AM

Originally posted by drakonas777 View Post

I am not against testing case where benchmarks are optimized for both architectures. The problem is it's infeasible to do at the moment, because all benchmarking suites won't get optimized over night. On the other hand recompiling test suites to rely on generic ISA is relatively easy to do. This way we could at least eliminate the use of specialized vector/matrix instructions what is the main reason for ARM being "unoptimized" in certain workloads. Yes, it's not perfect solution, but it's more fair one. Furthermore, it would benefit Apple Silicon far more than x86, since Apple's ARM implementation does not support SVE and NEON is not a equivalent of AVX2/512.

If you admit that a lot of code is not optimized for Apple's ARM then don't use that for evidence that ARM is not as power efficient as it is. Thats all that needs to be said.

Originally posted by drakonas777 View Post

Almost all microarchitecture-specific intrinsics and hand crafted ASM optimizations are organized around specialized instructions. The idea that x86 is generally "magically optimized" in the context of generic subset while ARM is not is pure trash.

I never said that, in fact I am saying the opposite. A lot of code being benchmarked (especially the code that is benchmarked in Phoronix test suite) has actually been optimized specifically for x86 using specific instruction sets where as for ARM its not because of the authors of those specific codebases were not expecting ARM to be a significant share of its program users.

Originally posted by drakonas777 View Post

At the end of the day I'm not saying ARM is not generally more power efficient. It is (in frontend mostly). However, we are talking percentages and not integer factors. People watched that original M1 event where 5N based modern HPC ARM implementation was compared against 2015 14N Intel garbage and somehow they think, that the same efficiency gains more or less are maintained againsy modern x86. They are not. Actually, next year ZEN5 is going to be the most 'fair' x86 to compare with M1/2, because both the lithography and backend design will be the closest so far.

x86 problem is not an ISA. Problems are: a) it's lagging behind Apple Silicon in lithography/SoC design/backend design by years b) most x86 mobile products are trash and have idiotic factory power limits coupled with small low quality batteries. Yes, ARM will always be more efficient by some amount due more efficient frontend, but that's only a small fraction if we consider the whole system design.

Again you are twisting things around here. Firstly no one is referencing Apples marketing here, second you can easily compare lithographies now that we have 7000 series (both of these use 5nm even though M1 is like 3 years old at this point?) and lastly you are actually completely ignoring all of the optimizations that the ARM ISA allows because unlike Intel's x86, they don't have to deal with decades of legacy.

You assume that this legacy is only single digit percentages when in reality this is not correct, or to put it more accurately while a single deficiency of ISA design may be just single digit percent there is a lot of them and it adds up. As an example, With ARM v8 every single instruction is of the same size, this makes pipelining in ARM trivial because you don't have to deal with the problem of not knowing what size a future potential instruction is. This is not the case with x86, which means that with x86 they have to add a pipelining cache to solve this issue, that takes power/silicon size and this is just a single example. Another one is that there is no need for SMT on ARM because the ISA allows compilers to program more information about potential branching in the binary where as x86 this is not the case, this is why SMT was invented/used in the first place on x86 because the instruction pipeline was being stalled due to branches in logic and one way to make sure that your instruction pipeline is almost always filled is to have virtual threads that get multiplexed onto real hardware threads (this is what SMT is) so if one thread is stalling due a decision tree then SMT means that it can pull an instruction from another program on another virtual thread. Again this takes power and die space which ARM doesn't have to deal with and I am only listing 2 examples here.

The ultimate point here being made is that of course there are solutions to the problems that x86 inhibits from its decades old design but they all come at a cost which ARM doesn't have to deal with due to how it iterates (i.e. the ISA gets reinvented every 5-10 years ditching old patterns/designs that don't work). Similar to Microsoft with Windows, the "don't break old programs" is slowly becoming an achilles heel for x86. This is also ontop of the fact that ARM from the getgo was designed with pure power efficiency in mind, no one is doubting that if you benchmark an AMD laptop that is connected to wall power (which causes its TDP to rise) that its powerful but ARM was designed for power efficiency as its top priority, x86 was designed for power/performance as its top priority.

Or put differently, if you asked someone to design an ISA that is prioritized for power efficiency while still having good performance, no one would suggest/produce anything that is close to x86/x86_64, it would be closer to ARM/RISC V. There are just so many things in x86/x86_64's in the design which provides that extra inch of performance but proportionally uses much more power (as well as just old stuff which isn't ideal for today's day and age).

**t.s.** · 08 December 2023, 04:11 AM

Originally posted by uid313 View Post

So the AMD Z1 beat Apple M2 but was was at up at 32 W while the M2 supposedly is at 5 to 8 W.

Sources?

**coder** · 08 December 2023, 04:27 AM

Originally posted by Dukenukemx View Post

Great but I can't buy a Graviton 3 or 4.

Yes, but it shows that using SVE is viable without blowing your power budget, contrary to your claims. Eventually, consumer laptops will have ARMv9-A cores, at which point they can use it, as well.

I'm not trying to sell you on ARM. I have no dog in this fight, other than to trying share some information where it seems to be lacking.

**drakonas777** · 08 December 2023, 05:02 AM

Originally posted by mdedetrich View Post

If you admit that a lot of code is not optimized for Apple's ARM then don't use that for evidence that ARM is not as power efficient as it is. Thats all that needs to be said.

I never said that, in fact I am saying the opposite. A lot of code being benchmarked (especially the code that is benchmarked in Phoronix test suite) has actually been optimized specifically for x86 using specific instruction sets where as for ARM its not because of the authors of those specific codebases were not expecting ARM to be a significant share of its program users.

Again you are twisting things around here. Firstly no one is referencing Apples marketing here, second you can easily compare lithographies now that we have 7000 series (both of these use 5nm even though M1 is like 3 years old at this point?) and lastly you are actually completely ignoring all of the optimizations that the ARM ISA allows because unlike Intel's x86, they don't have to deal with decades of legacy.

You assume that this legacy is only single digit percentages when in reality this is not correct, or to put it more accurately while a single deficiency of ISA design may be just single digit percent there is a lot of them and it adds up. As an example, With ARM v8 every single instruction is of the same size, this makes pipelining in ARM trivial because you don't have to deal with the problem of not knowing what size a future potential instruction is. This is not the case with x86, which means that with x86 they have to add a pipelining cache to solve this issue, that takes power/silicon size and this is just a single example. Another one is that there is no need for SMT on ARM because the ISA allows compilers to program more information about potential branching in the binary where as x86 this is not the case, this is why SMT was invented/used in the first place on x86 because the instruction pipeline was being stalled due to branches in logic and one way to make sure that your instruction pipeline is almost always filled is to have virtual threads that get multiplexed onto real hardware threads (this is what SMT is) so if one thread is stalling due a decision tree then SMT means that it can pull an instruction from another program on another virtual thread. Again this takes power and die space which ARM doesn't have to deal with and I am only listing 2 examples here.

The ultimate point here being made is that of course there are solutions to the problems that x86 inhibits from its decades old design but they all come at a cost which ARM doesn't have to deal with due to how it iterates (i.e. the ISA gets reinvented every 5-10 years ditching old patterns/designs that don't work). Similar to Microsoft with Windows, the "don't break old programs" is slowly becoming an achilles heel for x86. This is also ontop of the fact that ARM from the getgo was designed with pure power efficiency in mind, no one is doubting that if you benchmark an AMD laptop that is connected to wall power (which causes its TDP to rise) that its powerful but ARM was designed for power efficiency as its top priority, x86 was designed for power/performance as its top priority.

Or put differently, if you asked someone to design an ISA that is prioritized for power efficiency while still having good performance, no one would suggest/produce anything that is close to x86/x86_64, it would be closer to ARM/RISC V. There are just so many things in x86/x86_64's in the design which provides that extra inch of performance but proportionally uses much more power (as well as just old stuff which isn't ideal for today's day and age).

I agree with you at fundamental level. My main point is this: given "maximum equivalency" (litho/backend/SoC/packaging/system/sw optimizations) ARM will be generally more efficient in power consumption and surface area, but this advantage will be in up to several tens of percents, not several times on average. That's all. It's not like x86 has to pull 300W to be competitive against 10W ARM in performance as someone here imagined and I feel that a lot of people tend to think in this way.

**uid313** · 08 December 2023, 06:02 AM

Originally posted by t.s. View Post

Sources?

Apple M2 On Linux Performance Against AMD Zen 4 Mobile SoCs - Phoronix

https://www.phoronix.com/review/apple-m2-zen4-mobile

Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

**mdedetrich** · 08 December 2023, 09:54 AM

Originally posted by drakonas777 View Post

I agree with you at fundamental level. My main point is this: given "maximum equivalency" (litho/backend/SoC/packaging/system/sw optimizations) ARM will be generally more efficient in power consumption and surface area, but this advantage will be in up to several tens of percents, not several times on average. That's all. It's not like x86 has to pull 300W to be competitive against 10W ARM in performance as someone here imagined and I feel that a lot of people tend to think in this way.

Not sure if you know, but tens of percents is quite massive and in reality the figure is larger than that (again if we are talking about unplugged from wall)

**Dukenukemx** · 08 December 2023, 09:39 PM

Originally posted by mdedetrich View Post

Im not talking about Geekbench scores, I am talking about the Phoronix scores which you referenced/brought up here https://www.phoronix.com/forums/foru...31#post1426831.

That's drakonas777, not dukenukemx. You got the guy with the wrong D.

Have a look at https://www.phoronix.com/review/apple-m2-zen4-mobile/3, specifically John the Ripper's score where the macbook Air is 5 times slower. That makes ZERO sense and if you think about it why its so low it makes perfect sense, John the Ripper has never targeted ARM, either on server (who uses John the Ripper on new ARM servers), desktop (M series is new) or on mobile (again who runs John the Ripper on mobile).

I'll stand in for drakonas777. My guess is that it has to do with the Air not having active cooling. Either that or it's just an application that the M1 chip doesn't perform well on. Not all CPU's have to perform equally well on all applications. Exactly how many applications today have ever targeted ARM? x86 is so old that everything has targeted it first. My Porsche 928 has a Intel 4004 in it, just to give you an idea.

**mdedetrich** · 09 December 2023, 03:50 AM

Originally posted by Dukenukemx View Post

I'll stand in for drakonas777. My guess is that it has to do with the Air not having active cooling. Either that or it's just an application that the M1 chip doesn't perform well on. Not all CPU's have to perform equally well on all applications. Exactly how many applications today have ever targeted ARM? x86 is so old that everything has targeted it first. My Porsche 928 has a Intel 4004 in it, just to give you an idea.

Its actually not specific to Air, its just that John the Ripper in general doesn't perform well on ARM because its not optimized for it, see https://openwall.info/wiki/john/benchmarks

Announcement

AMD Announces The Ryzen 8040 Series Mobile Processors With Better Ryzen AI

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment