Announcement

**grigi** · 14 May 2018, 06:38 AM

Only an order of magnitude behind some of the faster ARM cores. Not that bad, considering that the compiler probably generates sub-optimal code, and there may be some rather obvious bottlenecks picked up once profiling of the actual hardware happens.

Here's for hoping!

**boxie** · 14 May 2018, 07:20 AM

Originally posted by grigi View Post

Only an order of magnitude behind some of the faster ARM cores. Not that bad, considering that the compiler probably generates sub-optimal code, and there may be some rather obvious bottlenecks picked up once profiling of the actual hardware happens.

Here's for hoping!

I am sure we are in for a long road of hardware/software improvements (arm has a 30 year head start at this point)

**RSpliet** · 14 May 2018, 08:37 AM

You can optimise software all you want, but at the end of the day a 1.5GHz in-order single-issue core is not going to give you an awful lot more. It'll stall on every DRAM access, its branch predictor is fairly straightforward, it doesn't seem to do any prefetching on data and (adding to the drama, albeit less relevant) the divider appear to be of the simplest Radix-2 type, superseded in most serious application cores many many moons ago. By design, not dissimilar to say an ARM Cortex A5, the Instructions Per Cycle is never going to exceed 1 as even reflected in their own U54 technical reference manual.
The Cortex A57 we're comparing against has a triple IPC upper bound, significantly more opportunities to sustain high IPC in practice (prefetching, OoO execution) and is running at a 33% higher clock speed. Not to mention that the TX2 "actually" has a Denver core rather than an A57, which is supposed to be even beefier. The only performance advantage the HiFive board has is the faster DRAM (DDR4-2400 vs. LPDDR-1866 for the TX2), but without out-of-order execution you'll be bound by latencies more than by throughput anyway. In the light of these differences, I'm not shocked with the reported performance.

**johanb** · 14 May 2018, 09:12 AM

Originally posted by RSpliet View Post

You can optimise software all you want, but at the end of the day a 1.5GHz in-order single-issue core is not going to give you an awful lot more. It'll stall on every DRAM access, its branch predictor is fairly straightforward, it doesn't seem to do any prefetching on data and (adding to the drama, albeit less relevant) the divider appear to be of the simplest Radix-2 type, superseded in most serious application cores many many moons ago. By design, not dissimilar to say an ARM Cortex A5, the Instructions Per Cycle is never going to exceed 1 as even reflected in their own U54 technical reference manual.
The Cortex A57 we're comparing against has a triple IPC upper bound, significantly more opportunities to sustain high IPC in practice (prefetching, OoO execution) and is running at a 33% higher clock speed. Not to mention that the TX2 "actually" has a Denver core rather than an A57, which is supposed to be even beefier. The only performance advantage the HiFive board has is the faster DRAM (DDR4-2400 vs. LPDDR-1866 for the TX2), but without out-of-order execution you'll be bound by latencies more than by throughput anyway. In the light of these differences, I'm not shocked with the reported performance.

Here are some benchmarks I did on the SiFive HiFive1 regarding compiler performance

The ARM board is a Teensy 3.6 (ARM Cortex-M4)

CoreMark: https://i.imgur.com/pW53feD.png
Dhrystone: https://i.imgur.com/Cb07Rcx.png

EDIT: By the way, the LLVM git crashed on me when trying to compile Dhrystone, it's obviously still slow and unstable. Did these benchmarks in march last year.

**RSpliet** · 14 May 2018, 09:25 AM

Originally posted by johanb View Post

Here are some benchmarks I did on the SiFive HiFive1 regarding compiler performance

The ARM board is a Teensy 3.6 (ARM Cortex-M4)

CoreMark: https://i.imgur.com/pW53feD.png
Dhrystone: https://i.imgur.com/Cb07Rcx.png

EDIT: By the way, the LLVM git crashed on me when trying to compile Dhrystone, it's obviously still slow and unstable. Did these benchmarks in march last year.

Thanks! I suspected there'd be Cortex M processors with a similar performance point, so it's nice to see they're on the same page. Of course, now I'm curious about power consumption of the cores/SoC in isolation... but can't have everything!

Let me clarify my point a little though, for the sake of completeness: of course compiler optimisations matter! Especially on these kind of cores, having fewer instructions means more performance. However, these optimisations already exist and are part of core GCC/LLVM. In fact, they have been applied on the benchmarks presented in the news article . What I tried to say instead was that I have very little expectations of future development on GCC/LLVM leading to significant performance gains. There's only very few clever tricks to apply on such a relatively trivial processor architecture. Perhaps a little can be won with instruction scheduling, but even that often requires at least a level of speculative execution beyond branch prediction to be useful.

**johanb** · 14 May 2018, 09:52 AM

Originally posted by RSpliet View Post

Thanks! I suspected there'd be Cortex M processors with a similar performance point, so it's nice to see they're on the same page. Of course, now I'm curious about power consumption of the cores/SoC in isolation... but can't have everything!

I had plans to measure the power draw aswell, but I relalized that the actual microcontroller controlling the GPIO pins seemed to draw more power than the actual processor itself so it made no sense to do so. If i remember correctly the Teensy GPIO controller was built-in on the ARM core so the power draw was much lower.

**notgonnasay** · 14 May 2018, 10:45 AM

This is a silly comparison without more data and normalized numbers. The Jetson TX2 is a fan-cooled 16nm process chip. I guess the Sifive chip is passively cooled? It is a 28nm chip, though, so comparing it directly against a 16nm one is a bit weird. Do they have the same TDP? If not, that should be taken into account as well. The performance should also be normalized by chip size or the number of transistors, to be able to judge the relative efficiency of their designs.

**onicsis** · 14 May 2018, 11:25 AM

RISC-V has a big advantage over competition, being open virtually anyone can contribute (even students) to improve HW design, development and manufacturing. Also same thing applying for software. ARM has 30 years in advancement, tens of thousands of contributors directly or indirectly. RISC-V can close the gap with ARM development in good time.
From other Phronix benchmarks, NVIDIA Jetson TX2 is well well ahead of the competition, RISC-V has similar computing power to a Orange-PI ONE or PINE-64, ARM CorteX A7 or ARM Cortex-A53. RISC-V is on schedule evolving from implementation in FPGA as first step, to next step a low power consumption CPU.

**oiaohm** · 14 May 2018, 07:03 PM

Originally posted by RSpliet View Post

Let me clarify my point a little though, for the sake of completeness: of course compiler optimisations matter! Especially on these kind of cores, having fewer instructions means more performance. However, these optimisations already exist and are part of core GCC/LLVM. In fact, they have been applied on the benchmarks presented in the news article . What I tried to say instead was that I have very little expectations of future development on GCC/LLVM leading to significant performance gains. There's only very few clever tricks to apply on such a relatively trivial processor architecture. Perhaps a little can be won with instruction scheduling, but even that often requires at least a level of speculative execution beyond branch prediction to be useful.

You right to say it will not be GCC/LLVM alone. There are still modifications do able done rocket risc-v inorder to extract more performance. Like working out what pairs of instructions should be next to each other because in hardware can process them at exactly the same time. This is an in-order hardware and software optimisation.

BOOM v2: An Open Source Out Of Order RISC V Core

https://www.youtube.com/watch?v=toc2GxL4RyA

Presentation by Christopher Celio at UC Berkeley on November 29, 2017 at the 7th RISC-V Workshop, hosted by Western Digital in Milpitas, California. To view...

With current design boom2 cores of risc-v its possible to double performance over a rocket. Even so a boom2 core uses way less silicon than a TX2 core. If you are going by area of silicon its quite a tight match between a boom2 and a tx2 core. It will possible be what ever is boom3 that matches tx2 core.

Announcement

RISC-V Benchmarks Of SiFive's HiFive Unleashed Begin Appearing

RISC-V Benchmarks Of SiFive's HiFive Unleashed Begin Appearing

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment