Announcement

**starshipeleven** · 14 August 2016, 03:33 PM

Originally posted by atomsymbol

It would be interesting to see RISC-V evolve over the next 10 years while it preserves compatibility. Unfortunately, the probability of RISC-V to survive 10 years is quite low in my opinion (but I may be wrong).

Others have already said that RISC-V does not need to add new instructions every few years.

**starshipeleven** · 14 August 2016, 05:30 PM

Originally posted by atomsymbol

If RISC-V intends to compete directly with say x86 and intends to have say 10% notebook|desktop|server market share it will be forced to add new instructions every few years.

1. competing with x86 in consumer market is nonsense because the main reason x86 still exists in consumer market is the software ecosystem (windows and its programs), not x86's performance or something.
Sure performance is better, but what truly locks consumer market to x86 is software ecosystem, not performance.

2. why would it need to add new instructions, most of the reason x86 needs them is because that's the only way to use the hardware fully, as microcode said one page ago, RISC-V does not need that. Sure something will be added but it won't be like x86's SSE and AVX.

AMD is a member of RISC-V.

So is IBM, Google, HP (Enterprise I think), NVIDIA, Qualcomm, a couple big companies making Infiniband and other stuff for clusters, just to say the bigger ones.

Taking into account the fact that AMD's ARM CPUs haven't been successful so far,

Uhm, afaik they are slated for release in the future, Q1 2017 or more. Talking about "haven't been successful" is a bit like "lolwtf?".

it is highly improbable that AMD will deliver a RISC-V implementation competitive to x86.

AFAIK the more likely target for most RISC-V implementations around is ARM stuff (controllers and embedded processors) or MIPS, or whatever along these lines.

**Ardje** · 15 August 2016, 05:24 AM

Originally posted by atomsymbol

In my opinion, it follows from a theory of complexity that RISC must execute more instructions than CISC and must execute more conditional jumps than CISC while performing a particular job. The RISC instruction trace contains more conditional jumps than the CISC instruction trace. The length of the dynamic Huffman encoding of the RISC instructions seen by the RISC CPU is by the nature of the difference between RISC and CISC larger than the length of the dynamic Huffman encoding of the CISC instructions seen by the CISC CPU.

The x86 however (the 16 bit variant at least) needed a load more instructions to move the data to the right register because it was broken by design, every register had a dedicated function, although from an 8 bit perspective at that time pretty advanced and very slow.
So although with CISC you expect to use less instructions, in fact it were more on the x86 platform, while at the same time those instructions were so much slower than RISC.
If you realise that 99.99% of a cpu's job is to only move data from one part of the cpu to another part where the actual work gets done (fpu, accumulator, barrel shifters)...
Cisc usually uses microcode to perform these movement patterns.
Intel designed the i860 with extrem RISC in it's mind. No branch predection, could perform 2 instructions parallel (fp and integer decoded as "a single" 64 bit instruction), and included a pipeline stalls (if you go lower to like microcode based CPU's, you don't have pipeline stalls, you need to insert NOP's or instructions for other parts until the data has been settled at the correct place). Branch prediction was encoded in the instruction.
They had a fmult, but no fdiv, because they said that a fdiv was not easy to implement in a single cycle, and you could approach the result very close by using 7 mults/branches in succession. There was no single stack. Or actually, there was no stack. Just a store register (pc) at (what Rn points at), and load pc with contents of another Rn. Stack was an agreement in the ABI.
They also assumed that compiler technology would get that advanced that the compiler could perform branch prediction and instruction scheduling right. They were probably right, but only were 20 years off on when that would happen. So in the end, wrong branch prediction by compiler and such meant that only handcoded code performed.
So it died.
There was also good CISC like the pdp11 or the 68k cpu.
Anyway: long story short: there is no real difference in performance between RISC or CISC, iff your CISC is defined well. The CISC would need more CPU die space, and RISC might need a tiny bit more RAM. If you compare linux 32bit code to arm 32bit code, RISC needs *less* RAM. If you compare linux 64bit code (better optimilisation) with ARM 32 bit, they are on par (just look at the .so or just compare the bash executable on comparable platforms). So currently arm RISC needs less code to do the same as intel CISC needs, and we are still not there yet on compiler scheduling for the ARM, and for intel it's pretty much worked out.

**duby229** · 15 August 2016, 09:10 AM

Originally posted by Ardje View Post

The x86 however (the 16 bit variant at least) needed a load more instructions to move the data to the right register because it was broken by design, every register had a dedicated function, although from an 8 bit perspective at that time pretty advanced and very slow.
So although with CISC you expect to use less instructions, in fact it were more on the x86 platform, while at the same time those instructions were so much slower than RISC.
If you realise that 99.99% of a cpu's job is to only move data from one part of the cpu to another part where the actual work gets done (fpu, accumulator, barrel shifters)...
Cisc usually uses microcode to perform these movement patterns.
Intel designed the i860 with extrem RISC in it's mind. No branch predection, could perform 2 instructions parallel (fp and integer decoded as "a single" 64 bit instruction), and included a pipeline stalls (if you go lower to like microcode based CPU's, you don't have pipeline stalls, you need to insert NOP's or instructions for other parts until the data has been settled at the correct place). Branch prediction was encoded in the instruction.
They had a fmult, but no fdiv, because they said that a fdiv was not easy to implement in a single cycle, and you could approach the result very close by using 7 mults/branches in succession. There was no single stack. Or actually, there was no stack. Just a store register (pc) at (what Rn points at), and load pc with contents of another Rn. Stack was an agreement in the ABI.
They also assumed that compiler technology would get that advanced that the compiler could perform branch prediction and instruction scheduling right. They were probably right, but only were 20 years off on when that would happen. So in the end, wrong branch prediction by compiler and such meant that only handcoded code performed.
So it died.
There was also good CISC like the pdp11 or the 68k cpu.
Anyway: long story short: there is no real difference in performance between RISC or CISC, iff your CISC is defined well. The CISC would need more CPU die space, and RISC might need a tiny bit more RAM. If you compare linux 32bit code to arm 32bit code, RISC needs *less* RAM. If you compare linux 64bit code (better optimilisation) with ARM 32 bit, they are on par (just look at the .so or just compare the bash executable on comparable platforms). So currently arm RISC needs less code to do the same as intel CISC needs, and we are still not there yet on compiler scheduling for the ARM, and for intel it's pretty much worked out.

I have no desire to argue this, but you are definitely wrong on a few of your points. A compiler will never be capable of dtermining in flight configurations. Never. It won't happen. Scheduling from a compiler was always completely retarded.

**duby229** · 15 August 2016, 09:18 AM

Originally posted by duby229 View Post

I have no desire to argue this, but you are definitely wrong on a few of your points. A compiler will never be capable of dtermining in flight configurations. Never. It won't happen. Scheduling from a compiler was always completely retarded.

EDIT: Just to add, it's the reason VLIW, EPIC type architectures suck for general purpose processing. Because a compiler isn't capable of determining in flight configurations, every time a branch comes along the compiler has to choose evry one. By choosing to let a compiler perform scheduling, the architecture requires considerably more execution hardware to run through branches, including every single untaken branch. It's stupid.

**Ardje** · 16 August 2016, 07:23 AM

Originally posted by atomsymbol

Traditional static compiler like gcc/g++ has limited ability to determine the outcome of branch conditions at compile-time. But I am not so sure this limitation extends to a JIT compiler with on-the-fly analysis and adaptation to changes in branch condition outcomes.

You two guys make discussing CPU's interesting again (quotes of quotes are not quoted :-( ), thanks for these wonderful insights.

Announcement

Fedora Progresses In Bringing Up RISC-V Architecture Support

Comment

Comment

Comment

Comment

Comment

Comment