Announcement

**jabl** · 02 October 2017, 01:44 PM

Originally posted by duby229 View Post

Let me ask your opinion on something, How many people are still coding in assembly? And how many of them -should- be? It's true I misunderstand some technical concepts, and I'll admit that. But there is no possible way in any hell that you could ever convince me 16 GPR's is enough. Especially when we live in an age with many GB's of RAM and -modern compilers-....

EDIT: Saying 16 GPRs is enough is exactly equal to 640k aught to be enough. It was joke then just as much as it is now. 16 GPRs is only enough when you have to manually comprehend the assembly you wrote. And chances are anyway you shouldn't have wrote it.

You're saying that more register are better. Sure, more registers are better, all else being equal. But in the real world, "all else being equal" isn't. There are significant costs associated with larger register files. Which is why AMD settled for 16 GPR's back when they designed x86-64. Load-store ISA's such as most "RISC" ISA's tend to settle around 32 GPR's as the sweet spot.

**duby229** · 02 October 2017, 02:10 PM

Originally posted by jabl View Post

You're saying that more register are better. Sure, more registers are better, all else being equal. But in the real world, "all else being equal" isn't. There are significant costs associated with larger register files. Which is why AMD settled for 16 GPR's back when they designed x86-64. Load-store ISA's such as most "RISC" ISA's tend to settle around 32 GPR's as the sweet spot.

32 GPRs probably was the sweet spot when DEC Alpha's were top of the line, but you know that was almost 20 years ago...... Fabrication technology has come a loooong way since then.

**jabl** · 02 October 2017, 02:34 PM

Originally posted by duby229 View Post

32 GPRs probably was the sweet spot when DEC Alpha's were top of the line, but you know that was almost 20 years ago...... Fabrication technology has come a loooong way since then.

Optimal register count is very weakly, if at all, correlated with the transistor budget, unless you're talking about really tiny microcontroller kind of chips. E.g. RISC-V also chose 32 GPR's even though it's only a few years old.

**milkylainen** · 02 October 2017, 02:39 PM

Originally posted by duby229 View Post

32 GPRs probably was the sweet spot when DEC Alpha's were top of the line, but you know that was almost 20 years ago...... Fabrication technology has come a loooong way since then.

Context state. Not everything becomes magically better by increasing size and width of everything. Technically, a larger addressing width by itself has more downsides than not. In x84_64 this is offset by a larger GPR-bank in the ISA and some fancy instructions which most core code does not use. What really has happened to 32-bit that hit a big disadvantage is that a bucketload of structures in normal and core code now defaults and handle 64-bit data arithmetic and normal data regardless of architecture. The penalty for handling such data is lost GPR slots and integer instruction scheduling is shot to shit.
So in a way one can say that modern Linux defaults it's performance to 64-bit architectures. The rest just has to cope.

**milkylainen** · 02 October 2017, 02:41 PM

Originally posted by jabl View Post

Optimal register count is very weakly, if at all, correlated with the transistor budget, unless you're talking about really tiny microcontroller kind of chips. E.g. RISC-V also chose 32 GPR's even though it's only a few years old.

Not transistor budget. Internally all x86 machines and modern CPU's use massive register renaming. The exposed smaller ISA is so you don't forcibly have to move much context state.

**duby229** · 02 October 2017, 02:53 PM

Originally posted by jabl View Post

Optimal register count is very weakly, if at all, correlated with the transistor budget, unless you're talking about really tiny microcontroller kind of chips. E.g. RISC-V also chose 32 GPR's even though it's only a few years old.

There is only one good reason to keep register counts low, and it seems to me it's to make coding assembly comprehensible. That's the only reason you think you see a weak correlation. But the truth is that right now we live in a modern era with fantastic compilers and a boatload of assembly is finally getting replaced after decades. We have both the fabrication technology and the compiler technology to massively increase register counts. So whatever weak design decisions went into RISC-V had nothing to do with fabrication or compiler capabilities.

**jabl** · 02 October 2017, 03:03 PM

Originally posted by milkylainen View Post

Not transistor budget. Internally all x86 machines and modern CPU's use massive register renaming. The exposed smaller ISA is so you don't forcibly have to move much context state.

Sure. That's one good reason to keep the (architectural) register count modest. Another being that with a smaller number of registers you need fewer bits to address them, so the code becomes more compact. Unless you go too small, of course, then you'll have lots of extra code doing spilling and reloading. So again it's a compromise.

And yes, it's also true that OoO processors can have significantly larger physical register files. IIRC current Intel chips have something like 128 or thereabouts. But again, it's a compromise. Large multiported register files take a lot of die area and are very power hungry, so some silly number of registers isn't a solution to anything either.

**duby229** · 02 October 2017, 03:08 PM

Originally posted by milkylainen View Post

Not transistor budget. Internally all x86 machines and modern CPU's use massive register renaming. The exposed smaller ISA is so you don't forcibly have to move much context state.

Yeah, but that's a fabrication problem. It's more a component performance issue than anything else, and Like I said fabrication technology has come a long way. The switching performance of individual transistors and such etched on an IC is remarkable with modern fabrication technology. No doubt at all the technology already exists to do it.

**sdack** · 02 October 2017, 04:57 PM

Originally posted by jabl View Post

Sure. That's one good reason to keep the (architectural) register count modest. Another being that with a smaller number of registers you need fewer bits to address them, so the code becomes more compact. Unless you go too small, of course, then you'll have lots of extra code doing spilling and reloading. So again it's a compromise.

And yes, it's also true that OoO processors can have significantly larger physical register files. IIRC current Intel chips have something like 128 or thereabouts. But again, it's a compromise. Large multiported register files take a lot of die area and are very power hungry, so some silly number of registers isn't a solution to anything either.

You're basically warming up an old argument: RISC vs. CISC. The reason why x86 CISC still exists today is because it's become an abstraction layer and underneath do we have RISC cores, that translate the x86 code into micro code, meaning, the instructions of a RISC core.

Back in the days when RISC emerged was it pretty obvious that it beats CISC. The DEC Alphas were simply beasts, and the dominance of the concept was obvious as day light back then. It was so obvious that everyone who had a few chip designers jumped onto the RISC bandwagon. Even HP came up with their own RISC design, and MIPS, ARM and POWER are still alive today - all being RISC designs, with countless others, which didn't succeed due to various failures in the makers' design (Motorola should be named here as one of the biggest losers of the change, who went from being "King of CISC" with their m68ks, to producing the much underwhelming m88k CPUs - a "WTF?"-moment for many).

Only due to Intel's clever tactic and their necessity to keep x86 alive for reasons of compatibility is it that we are seemingly stuck with CISC today, but these are all RISC designs underneath the hood, which have been forced into a corset.

**jacob** · 02 October 2017, 05:34 PM

Originally posted by duby229 View Post

Let me ask your opinion on something, How many people are still coding in assembly? And how many of them -should- be? It's true I misunderstand some technical concepts, and I'll admit that. But there is no possible way in any hell that you could ever convince me 16 GPR's is enough. Especially when we live in an age with many GB's of RAM and -modern compilers-....

EDIT: Saying 16 GPRs is enough is exactly equal to 640k aught to be enough. It was joke then just as much as it is now. 16 GPRs is only enough when you have to manually comprehend the assembly you wrote. And chances are anyway you shouldn't have wrote it.

It doesn't work that way. GPRs are used to store a program's live variables, which are always few and don't increase in number as available RAM grows. It is true that all things being equal, more GPRs = better but in practice it's more complicated than that. During a context switch, the CPU must dump its entire internal state to RAM and reload another state from RAM. Just going from eight 32-bit GPRs to sixteen 64-bit GPRs means that the volume of transferred data is multiplied by 4, which means more bus cycles, more latency, more cache pressure etc.

Then there is code density. Given a common, simple two-operand instruction such as add r8, r9, with 16 GPRs each register is coded on 4 bits, which means that the two operands are coded using exactly 1 byte. With 32 GPRs you would need 10 bits, which means 2 bytes with 6 "wasted" bits. This doesn't matter much on RISC machines where code is sparse and there is lots of "waste" no matter what, but on a CISC it would undermine one of the main advantages of variable instruction length. On the other hand, RISC always needs more registers to do the same thing and broadly speaking, 32 GPRs on a RISC = 16 GPRs on a CISC.

Also, the smaller the number of logical GPRs compared to physical, the easier it is for the instruction decoder to maintain a mapping without running out of available physical registers. By being able to feed the pipelines without having to wait for registers to become available, it can take full advantage of SMT and speculative execution, both of which are great performance features.

Basically when designing a new ISA, you have to try to find the best possible compromise between all those pros and cons and 16 logical GPRs is a good number in that regard for a CISC processor.

Announcement

Our Last Time Benchmarking Ubuntu 32-bit vs. 64-bit

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment