Announcement

**Weasel** · 17 August 2018, 01:00 PM

Originally posted by c117152 View Post

Cite what exactly, the future? ARM9 broke binary compatibility with ARM8 which broke with ARM7. Ignoring the cortex and thumb variations, that's 3 major releases since 93.

Because ARM was a complete non-factor back then.

I'm talking about ARM-on-the-desktop, which is the topic. You literally said "ARM can just break ISA backwards compatibility" and I assumed you speak about this subject. Maybe misunderstanding but I'd like to see how far they end up if they do that on the desktop.

Originally posted by c117152 View Post

Pipeline width is tied with the kind of predictor you can use which is determined by the instruction width when backwards compatibility is a concern. Intel can't just switch to a whole new microarch of their choosing. Decoder or not, width needs to be about the same or less and cache coherence (memory hierarchy for the non VLIW crowd) needs to grid to the C model. That limits their choices of predictors (and consequently, L$ layout) from dozens to 2 or 3 variations of the same one and a few internal details that may or may not produce the kind of nose demons we're seeing in the current generation of speculative attacks.

...what?!?

Did you just put some random buzzwords?

Your first statement is already wrong anyway. Pipeline width has nothing to do with instruction width, like, at all. (also, what instruction width? it's variable on x86, so which one do you refer to)

**brauliobo** · 17 August 2018, 01:47 PM

There are almost NO BENCHARKS OF ARM vs INTEL

**ldesnogu** · 17 August 2018, 03:04 PM

Originally posted by microcode View Post

Only if you use it, really. And if you do/can use AVX2, your W/FLOP is going to be better than if you didn't, most of the time (and your throughput is going to be hard to beat, which is part of why I'm skeptical).

Since 65nm leakage has become a major issue. So the simple existence of transistors increases power consumption even when they are not used. Even clock gating is not enough, and power gating is required. And the problem with power gating is that waking up a block is not immediate and creates power spikes.

I think now Intel power gates some of its blocks (AVX-512 for sure, perhaps AVX2 too), but they were late to that game compared to ARM; I guess they have mostly caught up now given how low power they can get with some of their (binned) CPUs.

**name99** · 17 August 2018, 04:00 PM

Originally posted by brauliobo View Post

There are almost NO BENCHARKS OF ARM vs INTEL

It's very important to clarify exactly what you mean by "benchmark", ie what aspect of performance are you looking at?

GeekBench4 results exist for MANY ARM and Intel configs, and are a fine way to compare PEAK CPU performance one against the other. They are not a great tool if you want to compare SYSTEMS against each other (ie issues of cooling capacity and thus throttling, or issues of network and storage performance).
And of course if you are obsessed with the idea that the only thing a computer SHOULD do is run x86 binaries and video games, then no benchmark is going to satisfy you if it uses native ARM code. (You think I joke? Look at how many people argue that they don't care how good ARM is if it doesn't run x86...)

So it's up to you. But if you look at the GB4 results, you will see (confirmed by my tests running various Mathematica tasks on my Mac vs my iPad) that an A10 is about "equivalent" to a Haswell at 3.5GHz.
Move on to A11 and I don't have a Mac newer than an i7 Haswell, but throw in five years or so of Intel improvements, and you're probably at A11 is about a 3.6 to 3.8GHz Kaby Lake or so. An A11 small core is about equivalent to 25% of a large core, so about equivalent to hyperthreading.
Obviously this is single-threaded performance. An A11 gives you 2+4 cores, so roughly let's say an i3 or so (two cores plus hyperthreading). Existing iPads (A10X give you three large cores,) and presumably Apple will continue with at least that, maybe 4+8 cores?, so up to an old-style i5 or i7 or so.
So for throughput (as opposed to latency) comparisons, obviously 6 or 8 cores will beat that; but that's a silly comparison because it's trivial for Apple to scale up to that many cores; it's not an interesting indication of what the two companies can do, all it tells you is where Apple has CURRENTLY targeted its products.

That doesn't tell you about ARM corporate cores (which are very sad indeed compared to Apple) but it does show what's possible with the ARM ISA if you're willing to do the R&D. And Apple cores are not THAT large (the entire compute complex of 2+4 cores and caches up to L3) is about 1/6 of a 90mm^2 die on TSMC 10nm. (Obviously that's a design optimized to run at 2.4 GHz; modifying it to run a higher GHz would doubtless use larger transistors, something we may see when Apple ships the ARM Mac.) ARM cores are even tinier than that, last I checked an ARM big core is about 1/4 the size of an Apple big core. Meaning they can grow it a LOT (and pick up some performance) and still not be very large on the die.

I'm giving you the numbers. If you want to put your head in the sand, or insist that none of these comparisons are "fair", go right ahead; denying reality is kinda what Phoronix readers do. But that's my honest attempts to clarify how the two line up.

**c117152** · 17 August 2018, 09:08 PM

Originally posted by Weasel View Post

Maybe misunderstanding but I'd like to see how far they end up if they do that on the desktop.

Changing ISAs every 3-5y will be welcomed by most developers as a form of planned obsoleteness.

Originally posted by Weasel View Post

Pipeline width has nothing to do with instruction width

In what world? In the one I'm living you have physical lines and the out-of-order superscalars are fetching, decoding and renaming of different instruction per cycle along with the appropriate register and cache modifications. You honestly think with such tight windows you can stick whatever you want in there and the caches won't thrash let alone lose cycles by the hundreds for every mispredict? Pipelines are the deepest they can be before lowering the frequencies is required. Caches are broken over multiple banks to get the best performance possible with the depth in mind. If you just start throwing in too-wide or not-wide-enough instructions, you will thrash.

**coder** · 17 August 2018, 10:02 PM

Originally posted by name99 View Post

So why are they doing this?

RISC V could be one factor. Maybe Ryzen mobile is another. In either case, maybe they're just trying to keep people on their bandwagon.

Also, it does mainly highlight the performance of their A76 core, which I'm sure they're keen to sell into new designs.

**coder** · 17 August 2018, 10:07 PM

Originally posted by Weasel View Post

Nobody forces you to use it. Transistors that are not used do not use power.

That's not the point. The original comment was asking why i5 used (i.e. is rated for) so much more power. My point was that, since AVX2 is a notorious power hog, perhaps it contributed to inflating Intel's TDP numbers.

Originally posted by ldesnogu View Post

That made my day.

I can make funny non-sequiturs, also.

**ldesnogu** · 18 August 2018, 01:40 AM

Originally posted by coder View Post

I can make funny non-sequiturs, also.

In fact the funny comment was from Weasel. He doesn't seem to know about leakage that makes unused transistors consume power. And this is doubly funny when you see how he mocks others.

Regarding your comment about AVX2 and TDP, in my experience on my Haswell to reach TDP I have to run AVX2 code.

**Weasel** · 18 August 2018, 08:08 AM

Originally posted by coder View Post

That's not the point. The original comment was asking why i5 used (i.e. is rated for) so much more power. My point was that, since AVX2 is a notorious power hog, perhaps it contributed to inflating Intel's TDP numbers.

Yeah of course, I thought you meant that with AVX2 it will go over the TDP. In fact in most cases it will use lower power than that. (no AVX2 code)

Originally posted by ldesnogu View Post

In fact the funny comment was from Weasel. He doesn't seem to know about leakage that makes unused transistors consume power. And this is doubly funny when you see how he mocks others.

Citation needed.

If anything, leakage means that you need to turn off even more transistors or risk overheating, quite the opposite of your claim that AVX2 transistors still get used despite not being used. Obviously it's total bullshit considering everyone knows how hot their CPUs get when they use AVX.

**ldesnogu** · 18 August 2018, 12:22 PM

Originally posted by Weasel View Post

Citation needed.

If anything, leakage means that you need to turn off even more transistors or risk overheating, quite the opposite of your claim that AVX2 transistors still get used despite not being used. Obviously it's total bullshit considering everyone knows how hot their CPUs get when they use AVX.

You seem to be so sure of yourself even though you obviously lack even basic knowledge of the subject that I'm not convinced it's worth wasting time putting links you either won't read or won't understand, but here it is for others to read:

https://www.eetimes.com/document.asp?doc_id=1264175

Leakage power is primarily the result of unwanted subthreshold current in the transistor channel when the transistor is turned off.

Power Consumption

https://semiengineering.com/knowledge_centers/low-power/low-power-design/power-consumption/

The power consumed in a device is composed of two types – dynamic, sometimes called switching power, and static, sometimes called leakage power. In geometries smaller than 90nm, leakage power has become the dominant consumer of power whereas for larger geometries, switching is the larger contributor. Power reduction strategies can be used to minimize both... » read more

So yes unused transistors have a power impact but read my other post above about power gating.

Announcement

ARM Aims To Deliver Core i5 Like Performance At Less Than 5 Watts

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment