Announcement

**deppman** · 22 November 2020, 11:18 PM

Originally posted by Boland View Post

Nope.

https://twitter.com/andreif7/status/1328820823877095425

Power virus load on CPU and GPU is 32 Watts. The GPU maxes out at 10 Watts.

My estimates were too high. I fixed this elsewhere but since this post was held for 24 hours I couldn't fix it here. I'll fix it now. Thank you.

However, 32W is still substantially more than "10-20W".

**Boland** · 23 November 2020, 04:33 AM

Originally posted by deppman View Post

My estimates were too high. I fixed this elsewhere but since this post was held for 24 hours I couldn't fix it here. I'll fix it now. Thank you.

However, 32W is still substantially more than "10-20W".

That's absolute PEAK power for the entire SOC (CPU, GPU, Ram, Package), no normal workload will pull these figures. 10-20W will be the average TDP on real workloads.

https://twitter.com/andreif7/status/1328823629337989121

For instance, peak power for the entire package on "Rise of the Tomb Raider" is 16.5W (GPU 7W).

https://twitter.com/RyanSmithAT/stat...46992156422147

There is currently nothing else in this class.

**ldesnogu** · 23 November 2020, 08:03 AM

Originally posted by pinoli View Post

And this is indeed how it's going: ARM CPUs are deployed on every low power device on earth, but the "low power" here is crucial.
Where power is not an issue, ARM does not scale as well as x86, as we can easily see with current performance (please don't mention Fugaku).

Why shouldn't Fugaku be mentioned? Because it doesn't fit your mental image?

**PerformanceExpert** · 23 November 2020, 08:05 AM

Originally posted by duby229 View Post

Kinda OT: I'd be willing to bet Apple spent a huge effort optimizing the pipeline just for Spec, which necessarily means Spec results mean nothing and are incapable of reflecting its true performance on any other workload. I could totally see Apple tweaking the pipeline just specifically to get the highest score possible... Relying on a Spec result in this case to make a judgement almost certainly won't reflect real world results. I mean just look at this article as a perfect example, Apples product totally dominates the Spec benchmark, but then loses almost every single real benchmark which Spec supposedly reflects.... It's pretty obvious by just looking at this review alone that Spec results for this product don't reflect actual performance.

I feel the only way to get a true picture of its performance you'd have to benchmark the real thing. If you want to know how GCC performs, benchmark it. If you need to know about x264, benchmark it. If you need to know about blender, benchmark it... And this article proves it to me...

SPEC was designed precisely so that everybody can optimize for it. SPEC includes GCC as a benchmark, and Geekbench has LLVM as a benchmark. So what is your point? like Intel has been bragging how great they are at SPEC for the last 2 decades - right until AMD beat them. Then SPEC suddenly became a "bad" benchmark...

The reason M1 loses some of the Phoronix benchmarks is because most tests were using Rosetta. It managed to win quite a few tests despite the translation overheads. So the Phoronix results don't reflect actual native performance.

**PerformanceExpert** · 23 November 2020, 08:17 AM

Originally posted by ldesnogu View Post

Why shouldn't Fugaku be mentioned? Because it doesn't fit your mental image?

Indeed, Fugaku is proof that Arm does scale out to the very high-end while remaining power efficient. There are several other supercomputers that switched to Arm.

**ricebunny2020** · 23 November 2020, 10:20 AM

How much does memory bandwidth affect the Phoronix bench suite?

Does the 8GB M1 mini have the same bandwidth as the 16GB, or only half?

**pixo** · 23 November 2020, 12:57 PM

Originally posted by PerformanceExpert View Post

ISA is Instruction Set Architecture, ie. the definition of what eg. Arm or x86 instructions do and how they are encoded. What I meant is that micro-ops are basically a different encoding of instructions but with the same meaning. So micro-ops are very similar to the original ISA, as in, micro-ops on a CISC are CISC and micro-ops on a RISC are RISC. In reality you wouldn't call micro-ops RISC or CISC since they are not documented or usable from software.

This is wrong. Micro-ops are not just different encoding of the ISA. And for many RISC implementation they even cant.
One instruction from ISA may not need translation to micro-op and be executed as is.
But another instruction will be split to several micro-ops that are executed by CPU to get the instruction output.

Traditional RISC implementation had all instructions implemented in HW and would execute the instruction as it came.
For CICS this was not possible as the CPU would be huge to implement all the instructions. So the instruction would be translated to several micro-ops which were implemented in HW and the micro-ops would be executed.

Today most of the high end CPU implementations are hybrid. The RISC ISASs have grown and as we can cram a lot more transistors to one CPU so you can implement lots of CISC instruction in HW.
Another benefit of this is when the implementation screw up happen. Instruction can be executed with different micro-ops as intended. This would probably lead to a loss in performance as let say the instruction you wanted to be executed in one cycle may take 3 now. But still better than not working. And you can actually patch a CPU in wild to a degree this way.

**needmorehare** · 23 November 2020, 06:54 PM

To me, this is all very impressive on paper but it suffers from one major problem: a limited support lifecycle. Will I still be able to usefully use my hardware in 10 years time and will the software I buy for this hardware work with whatever new hardware I buy to replace it later down the line?

With x86 hardware, the answer is a clear yes, as by the time a given bit of software would normally become unusable, a solution has already been developed to keep it going. All my software, from DOS applications all the way through to the stuff I run on Windows 10, is still usable. Most of it is also perfectly usable on Linux too, meaning even if Windows got discontinued tomorrow or if Microsoft changed the OS in a way I found too objectionable, I could vfio what doesn't work fully today and later Wine it all in perfect, working order a few years down the line. That's little/no risk to me and my collection of software/multimedia.

I even have some confidence in SoCs like the Raspberry Pi, where there's a proven track record of long term support for the software available for it due to source code availability and explicit statements from their CEO. However, this M1 chip is every much tied to macOS and Apple are known to ditch their transition layers and backwards compatibility very rapidly. To think that anything I buy for an M1 Mac Mini will be completely useless as little as 5 years later, when Apple inevitably makes a backwards incompatible change, makes it a risky investment.

In either case, my PC has a Core i7 6700 which will still see me through the next couple of years, at which point, I think both AMD and Intel will have caught up somewhat in terms of performance per watt.

**ldesnogu** · 24 November 2020, 05:14 AM

Originally posted by needmorehare View Post

With x86 hardware, the answer is a clear yes, as by the time a given bit of software would normally become unusable, a solution has already been developed to keep it going. All my software, from DOS applications all the way through to the stuff I run on Windows 10, is still usable. Most of it is also perfectly usable on Linux too, meaning even if Windows got discontinued tomorrow or if Microsoft changed the OS in a way I found too objectionable, I could vfio what doesn't work fully today and later Wine it all in perfect, working order a few years down the line. That's little/no risk to me and my collection of software/multimedia.

You can't run DOS programs on Win 10 64-bit, you have to rely on an emulator (which could be run on an ARM machine).

And as far as Wine is concerned: https://www.macrumors.com/2020/11/18...oftware-on-m1/

Anyway as far as long term HW supoprt goes, nothing beats a self assembled machine with carefully chosen components

**PerformanceExpert** · 24 November 2020, 10:01 AM

Originally posted by pixo View Post

This is wrong. Micro-ops are not just different encoding of the ISA. And for many RISC implementation they even cant.
One instruction from ISA may not need translation to micro-op and be executed as is.
But another instruction will be split to several micro-ops that are executed by CPU to get the instruction output.

On modern CPUs the vast majority of instructions are a single micro-op. Only complex instructions are split into multiple micro-ops. It varies but eg. for Cortex-A72 "On average, Filippo said, each ARMv8 instruction translates into 1.08 micro-ops.".

So micro-ops are just a different encoding of the original ISA.

Traditional CISC implementation had all instructions implemented in HW and would execute the instruction as it came.
For RICS this was not possible as the CPU would be huge to implement all the instructions. So the instruction would be translated to several micro-ops which were implemented in HW and the micro-ops would be executed.

You have RISC and CISC swapped here. Initial RISCs didn't have any complex instructions, and every instruction was directly executed in a single cycle. CISCs used a micro-code engine to execute every instruction, which took many cycles and was extremely slow. Those days are gone now. RISC ISAs became more complex, while CISCs stopped using the most complex instructions and sped up the commonly used operations by using more transistors.

Today most of the high end CPU implementations are hybrid. The CISC ISASs have grown and as we can cram a lot more transistors to one CPU so you can implement lots of RISC instruction in HW.

Modern high-end CPUs are fairly similar at a high level and decode most instructions into a single micro-op. However that means the micro-ops are similar to the original ISA, ie. on x86 they support all of the key x86 features (such as load-op, complex addresses, large immediates, partial register writes etc). Using CISC micro-ops requires more hardware than splitting into many simpler RISC-like micro-ops, but it is also faster.

Announcement

Apple M1 ARM Performance With A 2020 Mac Mini

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment