Announcement

Collapse
No announcement yet.

Libre RISC-V Open-Source Effort Now Looking At POWER Instead Of RISC-V

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Honestly I’m not surprised. I always thought that the RISC-V project was a bit screw balled. .

    As for using a general purpose processor to handle GPU duties, didn’t Intel more or less tried that already. There is enough variance in what GPU’s and CPU’s do that it really doesn’t work out well. By the way that doesn’t mean GPUs can’t be improved for today’s workloads just that you can’t ignore their core reason for being.

    Comment


    • #12
      I've been following the "Open HW/CPU communities" for the better part of two decades.
      Right when the OpenRISC community was born and SPARC was forked off to different projects.
      Also all of the MIPS clones etc.

      It always comes down to the same old issues.
      Building a CPU that can compete costs shitloads of engineering hours.
      And it requires resources and tools way beyond sitting at your desktop with just code.

      Nobody is willing to put down the resources required to feed a continuously thriving and "free" community.
      And if for whatever reason something starts gaining momentum, companies start to fork things off,
      building their own hardware to earn money.

      As it is with hardware. After all this time with software. It's still easier to "sell" hardware than software.

      So here it is with RISC-V. As was with OpenRISC, as is with SPARC etc.
      The people/companies earning money have forked their own implementation from the ISA.
      Getting real opensource hardware implementations that is worth a damn is still at position zero.

      But here is to hoping. Cheers.

      Comment


      • #13
        Originally posted by andyprough View Post

        Take as old as time. It's all about the money.
        100%. Hardware situation != Software situation.

        Comment


        • #14
          Originally posted by wizard69 View Post
          As for using a general purpose processor to handle GPU duties, didn’t Intel more or less tried that already.
          A GPU should have a simple instruction set that maps close to the metal. X86 instructions need a complex decoder that can understand instruction codes that can span anyhere between 1 and 15 bytes long, and decode them into something that is a more RISC like underlying core (ever since Pentium PRO, X86 cores run a RISC inside).
          These X86 instruction decoders are adding extra complexity, additional pipeline stages (which also adds extra transistors and performance cost to mispredicted branches) as well as power consumption. The power consumption issue is bad enough, that both Intel as well as AMD have added small caches for decoded instructions so that they can omit avoid running the decoder on some hot paths.

          None of that is necessary with a clean RISC architecture. While it is unclear what GPUs also need from the ISA (compared to general purpose CPUs), it is clear that you can get way better bang for the buck by using a RISC core compared to an X86 one.

          Now which RISC core is more appropriate I don't know. As far as I know, a simple in-order RISC-V core (e.g. at the level of an ARM-A7), can be built with approximately 50% less transistors than a matching ARM core while still reaching same performance levels (the numbers were for the core only, caches excluded). I don't know about any Power Architecture comparison (it's been a while since Power CPUs were in-order cores anyway).

          The whole point is that if you have limited silicon real estate and limited power budget, you're best off by using an ISA that lets you keep things nice and simple. I'd assume RISC-V is close there, ARM probably isn't (it's too complex), whereas for Power I don't know.
          Last edited by pkese; 20 October 2019, 04:18 PM.

          Comment


          • #15
            Originally posted by Qaridarium
            I also think POWER is better than RISC-V
            Interesting... What makes you think so? I'm not as familiar with POWER ISA... Is it more appropriate for a GPU usage that RISC-V?

            Comment


            • #16
              I really like the idea of a POWER-based GPU. There's a few different subsets of the POWER ISA, and what's interesting is that we have powerful (not quite 7nm Ryzen level, but for 14nm it's plenty fast) hardware available to develop on. This would allow testing of concepts (e.g. using instruction trap functionality for new instructions) with rapid iteration vs. full simulation or using the slow RISC-V hard silicon available today.

              I'm certain we could free up a few POWER machines to the development team here, though we'd like a bit more focus on potential 4k / PCIe support as that would eliminate one of the last remaining binary blobs in a typical built desktop / workstation POWER system (namely the GPU)...

              Comment


              • #17
                Originally posted by Djhg2000 View Post
                The lesson here is when you name your project after a critical component you better be sure you can actually use that component. Since producing an open source GPU seems to still be the main focus the name isn't (and probably wasn't) appropriate for the task.
                sigh i know. it's going to be a friickin nuisance - the crowdsupply page name is hard-coded as well.

                there's a *lot* to get evaluated, here. we have to check that SVPrefix *and* VBLOCK can be adapted to fit the Power Architecture, then we have to see how to fit 16-bit Compressed instructions in, *and* work with the OpenPower Foundation to check that this is not just ok with them, it has to be ok "in a way that is non-disruptive". oh and work out how to add FP16 as well. and over *eighty* new opcodes for 3D and Video Vectorisation.

                there is a way to do like "escape-sequences" - similar to c++ namespaces and also how vi has two modes (edit and manipulate). the alphanumeric keys have different meanings depending on which "mode" you are in, and we will need to shuffle some Power opcodes around in order to clear out the 000-NNN major opcode space so that SVP and VBLOCK can use that entire range of 8 major opcodes. this is *not* something that you do lightly, so it will absolutely have to be behind a "mode".

                however, as this is not intended to be a "custom" system, we cannot just blithely assume that the OpenPower Foundation - and its members - will be happy with that, even though they have expressed strong interest in being truly open, because the compiler support needed will become public and a permanent part of the Power ISA landscape.

                now, it may be hugely _beneficial_ to the Power Architecture to do that - introduce 16-bit compressed instructions, FP16, full IEEE754 transcendental opcodes such as sin, cos, pow, log etc. - however it absolutely *has* to be discussed properly.

                the issue with the RISC-V Foundation is that they have ignored their obligations under Trademark Law to answer reasonable in-good-faith requests for well over 18 months. there *is* no discussion, and no effort made on their part to engage. we can't develop a Libre CPU / VPU / GPU under such hostile conditions.

                Comment


                • #18
                  Originally posted by lkcl View Post
                  however, as this is not intended to be a "custom" system, we cannot just blithely assume that the OpenPower Foundation - and its members - will be happy with that, even though they have expressed strong interest in being truly open, because the compiler support needed will become public and a permanent part of the Power ISA landscape.
                  My suggestion would be to pitch it as a compute accelerator superset of the POWER ISA. That would kill a couple of birds with one stone, and also be amenable to -mtune type compiler flags, without explicitly locking it to GPUs (fundamentally, a GPU is "just" a vector compute accelerator that offers a lot of instructions that are useful for graphics manipulation).

                  Implement CAPI as an optional interface and you'll get even more attention. The designs and RTL for that are open and would move the chip from just a GPU to something even more interesting to certain audiences -- specifically, a trustworthy accelerator.

                  Comment


                  • #19
                    Originally posted by pkese View Post

                    Interesting... What makes you think so? I'm not as familiar with POWER ISA... Is it more appropriate for a GPU usage that RISC-V?
                    it's very very early days: that's why i started the thread, to work out what would be needed, and how it would work. so far it looks like it's been designed for supercomputing (and VLIW etc.) - a lot has happened in ISA development since Power was first designed.

                    one thing we had not realised (not properly, if you know what i mean), is the full significant of "swizzle" in 3D. Jacob repeated it but it didn't really sink in: you need a whopping *TWENTY* bits (4 for a dest predicate mask, 2x4 bits for src1 index, 2x4 bits for src2 index) to get the full set of permutations for swizzle - no wonder most GPU ISAs are 64 to 128 bits!

                    so we have to do a major reshuffle anyway, there - we might as well look at Power.

                    Comment


                    • #20
                      Originally posted by madscientist159 View Post

                      My suggestion would be to pitch it as a compute accelerator superset of the POWER ISA. That would kill a couple of birds with one stone, and also be amenable to -mtune type compiler flags, without explicitly locking it to GPUs (fundamentally, a GPU is "just" a vector compute accelerator that offers a lot of instructions that are useful for graphics manipulation).
                      yeah it's a good idea. one thing we do have to be careful of: most GPUs (discrete GPUs), they have their own L1 cache - shader programs are typically no more than 1k, they therefore don't care about L1 cache usage (because it's separate) or instruction size: they're getting 4-16x the data for a single instruction anyway. as a Hybrid Processor, we *really* care, because the L1 cache is "shared" (i.e. there's only one).

                      also (because of the way that SimpleV turns scalar opcodes into vectorised ones, with "context") if we add instructions that are useful for a GPU, they're also useful as *scalar* HPC opcodes as well. so, yes, we end up with something that's useful as a compute accelerator as a "byproduct", which is nice.

                      Implement CAPI as an optional interface and you'll get even more attention. The designs and RTL for that are open and would move the chip from just a GPU to something even more interesting to certain audiences -- specifically, a trustworthy accelerator.

                      Comment

                      Working...
                      X