Announcement

Collapse
No announcement yet.

Libre RISC-V Open-Source Effort Now Looking At POWER Instead Of RISC-V

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by nokipaike View Post
    if you think about it, it would consume very little. it would be easy to design and build. the power would be all in parallelism.
    yyeh, but sadly, its general-purpose performance would suck. we want to combine the two tasks, so that you don't *have* two L1 caches, two sets of RAM, two sets of everything-but-slightly-different.

    so we are doing a compromise: turns out that if you have every other pipeline latch being dynamically "transparent", you can turn a 5-stage pipeline into a 10-stage one at the flick of a switch.

    running on low power, at low speed, you open up the gates and two pipeline combinatorial blocks are now connected back-to-back. want to run at desktop-level speeds, close up the gates and you have a 10-stage pipe that can run at 1.6ghz.

    Comment


    • #52
      Originally posted by lkcl View Post

      this was the hope that inspired Larrabee. they created an absolutely fantastic "Parallel Compute Engine". unnnfortunately, the GPU-level performance was so bad that the team was *not allowed* to publish the numbers

      Jeff Bush from Nyuzi had to research it, and i talked with him over the course of several months: we established that a software-only GPU - with no custom accelerated opcodes - would have only TWENTY FIVE percent the performance of, say, MALI 400, for the same silicon die area. that means that if you had a comparable performing software-only GPU, it would require FOUR times the power (and die area).

      obviously, that's not going to fly

      in speaking with Mitch Alsup on comp.arch i found out a little bit more about why this is. it turns out one of the reasons is that if you want a "fully accurate" IEEE754 FP unit, to get that extra 0.5 ULP (units in last place), you need THREE TIMES the silicon area.

      in a GPU you just don't care that much about accuracy, and that's why in the Vulkan Spec you are allowed a lot less accurate answers in SQRT, RSQRT, SIN, COS, LOG etc.

      basically there are areas where you are trading speed for accuracy, and these tend to conflict badly with "accuracy" requirements of traditional "Compute" Engines. we are kinda... lunies for even trying however if you look at the MIPS 3D ASE (you can still find it online), running instructions twice to get better accuracy is a known technique, and if we plan the ALUs in advance, we can "reprocess" intermediary results using microcoding, and serve *both* markets - GPU (less accurate, less time, less power), and IEEE754 (fully accurate, longer, more power).

      You all tend to exclude neural network techniques in the management of the cores and threads for energy and performance as a winning part of the puzzle.

      Useless I try to explain how the wheel is made, there are those who have already done it and understand much better than me ..
      This is an interesting video that tries to make explicit where the big players are moving ...

      The FUTURE of Computing Performance
      https://youtu.be/3PjNgRWmv90
      Last edited by nokipaike; 10-24-2019, 07:40 PM.

      Comment

      Working...
      X