Announcement

**vladpetric** · 13 September 2020, 03:49 PM

Originally posted by artivision View Post

You are correct but somehow confused. Arm cores can also use the same lithography and stages for 4.7Ghz speed with better real life IPC. There is no architectural wall for that as you seem to believe and also you don't get dynamics because that can be done tomorrow. How fast x86 can drop energy consumption, 5-10 years from now? Also if you did go to a proper school you would understand that CISC does not convert anything inside to RISC and cannot win again clean proper RISC at anything except programming and compiling friendliness. That is all about x86 for many years now, not some crazy super power.

Most of the IPC comes from high quality (i.e., not toy) microarchitectures (that is, how you implement the pipeline). Most ARM designs are toys (and sure, their cost is commensurate), with Apple ARMs being the only desktop-grade processors.

Generally speaking, it's easy to make a wide pipeline, it's super-difficult to keep it busy. E.g., you need super complex branch prediction and trace caches, something that not everyone knows how to do well.

x86-64 is a lot closer to RISC than you'd think. In a modern x86 processor, the vast majority of instructions get broken down into RISC-like micro-ops in the decoders. In contrast, classic CISC had instructions that ran like mini-programs.

That is simply not how modern x86 works - their pipelines are quite similar to that of their RISC counterparts, except for a messier front-end (more complex decode).

At the same time, modern RISCs aren't exactly the 80s "classic" RISC processors either (as they can have high variability in instruction execution time).

Maybe it is time you update your (micro)architectural knowledge. After reading an up-to-date H&P, I'd suggest Shen & Lipasti.

**carewolf** · 13 September 2020, 04:00 PM

Originally posted by tildearrow View Post

Then they turn consumer ARM into a closed-source design with encrypted firmware and zero open drivers.

You mean most designs aren't already?

**carewolf** · 13 September 2020, 04:03 PM

Originally posted by pete910 View Post

I was meaning real world benchmarks like what Phoronix does. Is geek bench mobile and desktop versions actually comparable ?

Think I'll wait for actual comparable benchmarks before coming to any conclusions either way.

Well, to get an Apple ARM desktop you have to sign a NDA to not reveal how slow and sucky they are.

**torsionbar28** · 13 September 2020, 04:47 PM

Originally posted by nuhamind2 View Post

if nvidia did buy it, will it fall under sanction ba for huawei too

I sure hope so. F $ % K the CCP. They've probably stolen enough IP to produce cutting edge ARM chips on their own by now anyways.

**artivision** · 13 September 2020, 05:01 PM

Originally posted by willmore View Post

Please go back to school (a better one) because you're wrong. ARM cores using the same litho as x86 cores perform slower than (for example) Ryzen cores. And when you crank up the clocks on ARM cores to try to make them competetive with x86 cores, guess what? Power efficiency drops to the same levels or worse. Why ARM is always perceived as having better power efficiency is becaue they live further down the speed/power curve most of the time.

If you really think that litho is all that determines the speed of a processor, then you missed the chapter on pipelining. You could be forgiven for not understanding that if this were the 1960s, but four decades have passed and anyone who 'went to a good school' would understand that.

I wrote stages, and no RISC at the same Issue and Frequency don't have comparable consumption. So back to school.

**artivision** · 13 September 2020, 05:05 PM

Originally posted by drakonas777 View Post

I do not consider myself a CPU architecture guru TBH. From my limited understanding, x86, as such, is a CISC, which actually uses RISC engine internally by splitting large x86 instructions to the uops, while ARM, itself being a RISC, implements smaller instructions at ISA level. So first, this means, that ARM's IPC basically is not comparable to x86 IPC at ISA level, since ARM's instructions do less computation on average. To match the same performance ARM IPC and/or frequency actually must be higher than x86. Now, second, this means, that ARM most likely has also to have heavier RAM accesses, which also cost a consumption. Besides all that, transistor budget for caches and additional logic would also have to be increased. Perhaps ARM can indeed do that magical extended power envelope HPC CPU tomorrow, that would destroy any x86 at any given load, but I somehow doubt that it's so simple. Correct me if I'm wrong.

What smaller instructions man, you lost me there, are you sure a guru? REDUCED means from the same root not smaller, one instruction is also little from the other. Please i cannot comprehend what you are writing.

**artivision** · 13 September 2020, 05:12 PM

Originally posted by vladpetric View Post

Most of the IPC comes from high quality (i.e., not toy) microarchitectures (that is, how you implement the pipeline). Most ARM designs are toys (and sure, their cost is commensurate), with Apple ARMs being the only desktop-grade processors.

Generally speaking, it's easy to make a wide pipeline, it's super-difficult to keep it busy. E.g., you need super complex branch prediction and trace caches, something that not everyone knows how to do well.

x86-64 is a lot closer to RISC than you'd think. In a modern x86 processor, the vast majority of instructions get broken down into RISC-like micro-ops in the decoders. In contrast, classic CISC had instructions that ran like mini-programs.

That is simply not how modern x86 works - their pipelines are quite similar to that of their RISC counterparts, except for a messier front-end (more complex decode).

At the same time, modern RISCs aren't exactly the 80s "classic" RISC processors either (as they can have high variability in instruction execution time).

Maybe it is time you update your (micro)architectural knowledge. After reading an up-to-date H&P, I'd suggest Shen & Lipasti.

Second line: That is why you shouldn't do everything on hardware. There is even a 32 issue in order core that can be still filled by a single thread on some applications because prediction and cache management is software based. Also Reduced from RISC means family instructions from the same root and not smaller. CISC is CISC and RISC is RISC.

**drakonas777** · 13 September 2020, 05:34 PM

I already wrote that I do not consider myself a guru, so no, I'm not, obviously, but whatever

By smaller I mean ARM instruction size is 4 bytes while x86 up to 15 bytes.

**vladpetric** · 13 September 2020, 05:38 PM

Originally posted by artivision View Post

Second line: That is why you shouldn't do everything on hardware. There is even a 32 issue in order core that can be still filled by a single thread on some applications because prediction and cache management is software based. Also Reduced from RISC means family instructions from the same root and not smaller. CISC is CISC and RISC is RISC.

And your understanding of architecture is from the 90s. When did you go to college?

Also, there are several things that the hardware does really, really well, and caching is one of them.

If you cherry pick your benchmarks, then sure, you can get to some ridiculously high throughputs. But substitute that cherrypicked stuff with other benchmarks and the performance will tank (also, you're probably just quoting maximum widths here, not sustainable throughputs; maximum widths are just limits you're guaranteed to never exceed).

Take, for instance, the Phoronix suite - I don't think you'll find a single benchmark where you can get that kind of parallelism from software.

If you are trying to argue for a completely different architecture, you should do it with actual data, that is correctly gathered. It's really easy to have knee-jerk reactions: "this should be done this way, because it's BETTER" but without proper performance evaluations, and oftentimes with massive gotchas in the fine print, such as, you need extensive profiling for your software to ever work fast.

**artivision** · 13 September 2020, 05:55 PM

Originally posted by vladpetric View Post

And your understanding of architecture is from the 90s. When did you go to college?

Also, there are several things that the hardware does really, really well, and caching is one of them.

If you cherry pick your benchmarks, then sure, you can get to some ridiculously high throughputs. But substitute that cherrypicked stuff with other benchmarks and the performance will tank (also, you're probably just quoting maximum widths here, not sustainable throughputs; maximum widths are just limits you're guaranteed to never exceed).

Take, for instance, the Phoronix suite - I don't think you'll find a single benchmark where you can get that kind of parallelism from software.

If you are trying to argue for a completely different architecture, you should do it with actual data, that is correctly gathered. It's really easy to have knee-jerk reactions: "this should be done this way, because it's BETTER" but without proper performance evaluations, and oftentimes with massive gotchas in the fine print, such as, you need extensive profiling for your software to ever work fast.

OK you have a point, some things should be tested first commercially. But i have a feeling that they will be tested in the next 3 years by Nvidia, Nuvia, Apple. Also man no man no, the room shouldn't used for specific purpose circuit that is complex and not up-gradable:

Software Controls Cache Memory to Speed CPUs

https://spectrum.ieee.org/semiconductors/memory/software-controls-cache-memory-to-speed-cpus

Letting the operating system control cache memory management saves power too

Announcement

NVIDIA Reportedly Near Deal To Buy Arm For $40+ Billion Dollars

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment