Announcement

Collapse
No announcement yet.

More Details On The Proposed Simple-V Extension To RISC-V For GPU Workloads

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • More Details On The Proposed Simple-V Extension To RISC-V For GPU Workloads

    Phoronix: More Details On The Proposed Simple-V Extension To RISC-V For GPU Workloads

    With the proposed Libre RISC-V Vulkan accelerator aiming to effectively be an open-source GPU built atop the open-source RISC-V ISA there were recently some new details published on how the design is expected to work out...

    http://www.phoronix.com/scan.php?pag...ple-V-Detailed

  • #2
    Looking forward for a Raspberry PI Risk-V

    Comment


    • #3
      I recognize the eforts made in trying to bring some cool concepts to real world by this man.
      I trully respect that eforts,

      But could be that its already a Riscv implementation of that, with GAP8?
      https://greenwaves-technologies.com/en/gap8-product/

      "8 GOPS at a few tens of mW"
      Last edited by tuxd3v; 01-04-2019, 08:29 AM.

      Comment


      • #4
        Funny, this "ethical" view. When looking at the RISC-V mailing list, I get the impression that he is trying to force his "ethical" view onto others by trying to manipulate them emotionally.

        I'm curious which results his experiments with the vector extensions and out-of-order execution with Tomasulo's algorithm will bring. Personally I would suggest to use an in-order pipeline which is optionally superscalar and use a 16- or 64-wide SIMD unit (more like a 4-wide SIMD unit and four cycles to get the 16 calculations through). This might save some area so that more cores might fit on the chip. But I have no data to back this up.

        Comment


        • #5
          Originally posted by tuxd3v View Post
          I recognize the eforts made in trying to bring some cool concepts to real world by this man.
          I trully respect that eforts,

          But could be that its already a Riscv implementation of that, with GAP8?
          https://greenwaves-technologies.com/en/gap8-product/

          "8 GOPS at a few tens of mW"
          No that is more akin to the PS3 cell processor, where you have a main command processor that dispatches work to slave cores.

          The Simple-V idea is have a single bit you turn on that automatically extends normal operations into vector OPs.... which means you duplicate all the ALUs but only have one CPU core.

          Comment


          • #6
            Originally posted by cb88 View Post

            No that is more akin to the PS3 cell processor, where you have a main command processor that dispatches work to slave cores.

            The Simple-V idea is have a single bit you turn on that automatically extends normal operations into vector OPs.... which means you duplicate all the ALUs but only have one CPU core.
            Thanks for your clarification!

            The Gap8 and this concept shares some things, but it seems to be a different situation in fact..

            From what I understood from https://libre-riscv.org/simple_v_extension ( and maybe I am wrong.. ), it vectorize using a FIFO like queue, and dispatch, to several ALUs, in parallel.
            So its not SIMD, its a different approach, of Vectorization, some sort of an unrooll to registers, and dispatch to ALUs..?!

            But has some things in common, in GAP8 you have a CPU that dispatch, to 8 cores too,
            Here it seems only one core, which would need an amount of registers, to hold the "queue", so that operations could be dispatched to several ALUs..

            Don't know if I am saying some mistake,
            But in this case, you would need a Hardware CORE, that has lots of registers( at least some, to create the queue ), and some ALUs, for paralelization..

            In theory, even with only one ALU, in could have some gains, because Zero latency, of the queue organisation( that is done in hardware behind the scene.. ), but this needs hardware for that.

            Its a nice concept..

            Comment


            • #7
              Originally posted by tuxd3v View Post

              Thanks for your clarification!

              The Gap8 and this concept shares some things, but it seems to be a different situation in fact..

              From what I understood from https://libre-riscv.org/simple_v_extension ( and maybe I am wrong.. ), it vectorize using a FIFO like queue, and dispatch, to several ALUs, in parallel.
              So its not SIMD, its a different approach, of Vectorization, some sort of an unrooll to registers, and dispatch to ALUs..?!

              But has some things in common, in GAP8 you have a CPU that dispatch, to 8 cores too,
              Here it seems only one core, which would need an amount of registers, to hold the "queue", so that operations could be dispatched to several ALUs..

              Don't know if I am saying some mistake,
              But in this case, you would need a Hardware CORE, that has lots of registers( at least some, to create the queue ), and some ALUs, for paralelization..

              In theory, even with only one ALU, in could have some gains, because Zero latency, of the queue organisation( that is done in hardware behind the scene.. ), but this needs hardware for that.

              Its a nice concept..
              Yes it is SIMD... its just that SIMD vs SISD is controlled by a bit. Most modern SIMD ISAs are similar (they automatically unroll the the supported number of units).

              PS3 is MIMD as is GAP8 or really any multi core system. The both also happen to have one of the cores optimized for communicating quickly to the other cores to synchronize work.
              Last edited by cb88; 01-04-2019, 06:17 PM.

              Comment


              • #8
                Being personally interested in public choice theory, I'm actually quite curious to see what comes out of this social organization, given he's able to apply it.

                Comment


                • #9
                  Originally posted by cb88 View Post

                  Yes it is SIMD... its just that SIMD vs SISD is controlled by a bit. Most modern SIMD ISAs are similar (they automatically unroll the the supported number of units).

                  PS3 is MIMD as is GAP8 or really any multi core system. The both also happen to have one of the cores optimized for communicating quickly to the other cores to synchronize work.
                  Thanks for the clarification,
                  My mistake, I confused SIMD to be 1 ALU only( were data were first unrolled to simd registers, and then feeded to a, one only, pipelined ALU ).

                  Thanks!



                  Comment


                  • #10
                    Originally posted by cb88 View Post

                    No that is more akin to the PS3 cell processor, where you have a main command processor that dispatches work to slave cores.

                    The Simple-V idea is have a single bit you turn on that automatically extends normal operations into vector OPs.... which means you duplicate all the ALUs but only have one CPU core.
                    in effect, yes. the concept has more in common with software APIs than it does with hardware (the idea is something that is natural to someone trained in software engineering, and alien to a pure hardware-trained engineer).

                    an extremely informative article as to why SIMD is such a bad idea is here: https://www.sigarch.org/simd-instruc...dered-harmful/ it compares code size of various SIMD options (all of them awful) against variable-length vectorisation (extremely short, very simple).

                    however, the reasons why SIMD exist at all is because the simplicity *at the hardware level* makes it so seductive and compelling that hardware engineers can't help themselves

                    here's the thing: a very simple abstraction layer can be placed *in front* of a SIMD engine (as long as it has predication), to give it a variable-length vectorisation front-end as far as the instruction set is concerned, and that's really what matters to compiler writers and software developers.

                    so the next question is, why on earth are we doing our own Vectorisation system when RVV is in development? there's two answers to that: the first is that the RISC-V Foundation is actually a closed cartel, and recent investigation shows that it's a cartel that is in violation of the U.S. Sherman Act (look it up).

                    the second reason is technical. RVV is designed to provide a specific set of instructions with their own opcodes. those opcodes use a good fraction of the available 32-bit RISC-V opcode space (there's about 50 or 60 instructions in RVV), and there is no room in the 32-bit space for further vectorised instructions.

                    so let's say that you need a vectorised bit-manipulation operation (xBitManip is in development, in parallel). vectorised bit-manipulation is reasonable to have, particularly if developing a hybrid GPU / VPU, as you want to do a *lot* of YUV-to-RGB conversion, and other such basic operations.

                    how can that be added to RVV? well, it can't. there's no remaining 32-bit opcode space, it's a completely separate extension, and (going back to answer 1), if you're not part of the RISC-V cartel, you have absolutely no say in the matter. you're shut out.

                    by contrast, Simple-V automatically extends xBitManip to the parallel domain, by way of the "prefixing / extending" API, *even though xBitManip hasn't even been finalised*! and the reason is because Simple-V takes *all* opcodes - past, present and future - and makes it possible to parallelise them.

                    it's an abstraction layer, in other words. it really does have more in common with a Software API, being as it is a bit like a general-purpose for-loop, wrapped around instructions, just at the hardware level.

                    Comment

                    Working...
                    X