Announcement

Collapse
No announcement yet.

Libre RISC-V GPU Aiming For 2.5 Watt Power Draw Continues Being Plotted

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by wizard69 View Post
    I have this idea that Raspberry PI would be the ideal platform upon which RISC-V could get some respectability and more frankly volume.
    The RPI already has a purpose as a low cost education and hobbyist tool. Raising cost and breaking compatibility by changing the core does not benefit those goals.

    Comment


    • #12
      Originally posted by uid313 View Post
      25 FPS @ 720p for a handheld/mobile device is not enough.
      It needs 60 fps @ 1080p at least. Anything less is not good enough to meet even the minimal expectations.

      Also does it raytrace?
      The current plan calls for a quad core running at 800MHz at 2.5W, probably be able to achieve >1GHz at higher power draw. Each core will support 4x 32-bit float fma per clock cycle and 0.5x 32-bit float div per clock cycle, so assuming we can keep the pipeline full, it will be 27.2Gflops at 800MHz and 40.8Gflops at 1.2GHz.

      Yes, it can raytrace (it can run any RISC-V RV64GC program), though there most likely won't be any dedicated hw for raytracing and raytracing is low priority, so I'm not planning on implementing raytracing in the driver before we have a working product. Pull requests welcome (though it's not currently far enough along that it's practical to start implementing raytracing in the shader compiler) at https://salsa.debian.org/Kazan-team/kazan

      Also, we're thinking of implementing OmniXtend (basically TileLink over Ethernet) support with the intention that we can combine the SoCs into a larger NUMA system that has higher performance.

      Comment


      • #13
        Originally posted by AndyChow View Post
        I don't know how chips are designed, but I doubt that you can aim at a power draw like that without first having a functionnal design. It sounds like putting the cart before the horse.
        Well power is always a concern in engineering a chip. GPUs are a perfect example here as the manufactures can vary the number of execution units (among other things) to hit a power specification. I’m pretty sure the primary goal Apple has when designing their A series chips is to operate under a fixed power level. Frankly there are many situations where power is a driving factor in chip design.

        It is a a completely different question as to why this guy choose the power levels alluded too. Especially when you consider the relatively low performance goals. Frankly I suspect you the time the chip is in silicon it will be woefully out dated.

        Comment


        • #14
          Originally posted by bachchain View Post

          The RPI already has a purpose as a low cost education and hobbyist tool.
          Yes exactly why an open source processor would be so nice here. Not only would you have a platform for software engineering education you would have a platform for electrical engineering education.
          Raising cost and breaking compatibility by changing the core does not benefit those goals.
          Costs are unknown at this time. As for compatibility that would eventually happen anyways. Even then much of PI is in the Python world. Frankly I don’t see a big problem in this regard, the real issue would be that step up in performance.

          Comment


          • #15
            What even qualifies this as a GPU? Does it have any ROPs or Texture-mapping hardware?

            Originally posted by programmerjake View Post
            The current plan calls for a quad core running at 800MHz at 2.5W, probably be able to achieve >1GHz at higher power draw. Each core will support 4x 32-bit float fma per clock cycle and 0.5x 32-bit float div per clock cycle, so assuming we can keep the pipeline full, it will be 27.2Gflops at 800MHz and 40.8Gflops at 1.2GHz.
            Edit: for reference, here's what a Pi v3-class CPU can manage: http://www.roylongbottom.org.uk/andr...benchmarks.htm

            If you scroll down to the "T22 Android 5.1 64 Bit" section of "NEON MP MFLOPS Results", it appears to show a quad-core 1.3 GHz A53 reaching 10.8 GFLOPS, with all four cores cranking.

            Since A53 is a dual-issue core, presumably, that leaves some headroom for address arithmetic and control logic.

            Originally posted by programmerjake View Post
            Also, we're thinking of implementing OmniXtend (basically TileLink over Ethernet) support with the intention that we can combine the SoCs into a larger NUMA system that has higher performance.
            You're nuts. Who'd want to scale up a system with such poor efficiency?

            Make the vectors wider. Much wider. Then, and only then, should you worry about adding cores.
            Last edited by coder; 18 February 2019, 10:43 PM.

            Comment


            • #16
              Originally posted by wizard69 View Post
              I have this idea that Raspberry PI would be the ideal platform upon which RISC-V could get some respectability and more frankly volume.
              If it wouldn't be a performance downgrade, sure.

              Originally posted by wizard69 View Post
              This however doesn’t sound like the right implementation. In the ideal world the next iteration of PI would nearly double performance.
              The Pi family has used the same GPU since its launch. It needs a lot more than a 2x improvement. By the time Pi v4 launches in 2020, it's going to need a 5x - 10x speedup in GPU performance, as well as Vulkan and OpenCL support.

              Comment


              • #17
                Originally posted by wizard69 View Post
                Yes exactly why an open source processor would be so nice here. Not only would you have a platform for software engineering education you would have a platform for electrical engineering education.
                Huh?

                For education, EE's would want a soft-core and a big FPGA. You're not going to get that so cheap, nor would it run the core as fast as most Pi users would like.

                Comment


                • #18
                  Originally posted by coder View Post
                  Okay, so we're looking at lower performance than a purely software renderer running on a Pi v3?
                  One thing to keep in mind is that we are aiming for the whole SoC to be around 2-3mm2 in 28nm, so it is much smaller than the RPI v3's SoC, and hopefully less expensive. We're aiming for around $4 per chip.

                  Originally posted by coder View Post
                  What even qualifies this as a GPU? Does it have any ROPs or Texture-mapping hardware?
                  We are going to run benchmarks to determine which portions of the render pipeline need hardware acceleration and which parts are fast enough in software. We're most likely going to add HW acceleration for decoding compressed textures, and for sRGB<->linear conversion. We may also add HW acceleration for triangle rasterization and more acceleration for texture decoding. One part that is different than just a pure CPU is that we are supporting more FP-div performance than needed for a CPU, to handle perspective projection.


                  Originally posted by coder View Post
                  Make the vectors wider. Much wider. Then, and only then, should you worry about adding cores.
                  We are adding more cores before making the ALUs wider because we also intend for the cores to act as a CPU, where 4 cores is definitely better than 1 core for non-FP stuff. We are supporting variable-length vectors in the ISA, up to 256 elements (in my SVprefix proposal), so that will help improve ALU utilization and reduce power usage. We are also aiming for dual-issue OoO execution to help non-FP execution.

                  Comment


                  • #19
                    Originally posted by programmerjake View Post
                    One thing to keep in mind is that we are aiming for the whole SoC to be around 2-3mm2 in 28nm, so it is much smaller than the RPI v3's SoC, and hopefully less expensive. We're aiming for around $4 per chip.
                    Cool! what would a SBC with this roughly cost (with no guarantee that you will hit that mark)?

                    Originally posted by programmerjake View Post
                    We are going to run benchmarks to determine which portions of the render pipeline need hardware acceleration and which parts are fast enough in software. We're most likely going to add HW acceleration for decoding compressed textures, and for sRGB<->linear conversion. We may also add HW acceleration for triangle rasterization and more acceleration for texture decoding. One part that is different than just a pure CPU is that we are supporting more FP-div performance than needed for a CPU, to handle perspective projection.
                    As this is a first gen chip, I am guessing you would build a v2 with lessons learnt (and then a v3, ..., vN) etc?


                    Originally posted by programmerjake View Post
                    We are adding more cores before making the ALUs wider because we also intend for the cores to act as a CPU, where 4 cores is definitely better than 1 core for non-FP stuff. We are supporting variable-length vectors in the ISA, up to 256 elements (in my SVprefix proposal), so that will help improve ALU utilization and reduce power usage. We are also aiming for dual-issue OoO execution to help non-FP execution.
                    Are you leveraging any other open RISC-V designs like the recent Western Digital design?

                    Comment


                    • #20
                      Originally posted by boxie View Post
                      Cool! what would a SBC with this roughly cost (with no guarantee that you will hit that mark)?
                      I haven't done much research, but $10-20 for the parts sounds achievable for pcb, ram, soc, and power. It would cost more than that if you wanted ethernet (need a separate phy chip) and more than 256MB or so ram. note that the pricing is for small volumes, so it will go down for larger volumes. For more than a few you'd need to include the price of assembly and testing.

                      Originally posted by boxie View Post
                      As this is a first gen chip, I am guessing you would build a v2 with lessons learnt (and then a v3, ..., vN) etc?
                      probably

                      Originally posted by boxie View Post
                      Are you leveraging any other open RISC-V designs like the recent Western Digital design?
                      probably not, we have our own custom super-scalar OoO speculative architecture inspired by the cdc6600.

                      To mitigate spectre-class bugs, we have a speculation fence instruction and we are designing so that speculation isn't visible outside a core, so we don't have speculative cache fills unless we have a mechanism that ensures that they aren't visible to other cores while they are speculative.

                      Comment

                      Working...
                      X