Announcement

Collapse
No announcement yet.

Libre-SOC Still Persevering To Be A Hybrid CPU/GPU That's 100% Open-Source

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    I think the GPU core should be different from the CPU core. They can be very similar, they can both be based on POWER ISA, but they should be different.

    Then you can cut on the number of transistors in the GPU core, make the core smaller, simpler and more efficient. Keep in only what is necessary for a GPU. Then you can keep the CPU core bigger, faster and more power hungry.

    Comment


    • #42
      Originally posted by xfcemint View Post

      Forgot to answer to this issue.

      Yes, a world doesn't need another CPU with vector extensions, but it needs an open-source CPU and an open-source GPU. And, an open-source GPU can be built from many CPU cores with vector extensions. Consequently, that's why a CPU with vector extensions is needed - to make a GPU, not to make a CPU.
      ah, parallella / adapteva, and i was really hoping their approach would take off, showed that the "sea of little cores" approach is not necessarily going to fly. my feeling there is that it couldn't run standard applications (it wasn't a SMP NOC because the little cores were "embedded" platform RV32, not "UNIX" platform compliant RV64GC, to keep them small enough to make a sea of them). they also didn't add any kind of SIMD or Vector processing. unfortunate.

      Comment


      • #43
        Originally posted by xfcemint View Post

        You are likely wrong.

        The simplest way to design a GPU is probably the CPU+vector ISA extensions path. So, from the perspective of simplifying the project, ISA extensions are not such a bad idea at all.

        The difference is that, when you make a GPU in that manner (from CPUs with vector ISA), then you create it out of small, low-power cores, but lots and lots of them. It wouldn't be a super-high performance GPU, but it would be able to do the job of an integrated GPU. That is already sufficient for this project to be sucessfull. This is not comparable to any current design, so you are wrong when you say "a GPU requires that, that, and that", because no, this doesn't look to be that kind of a GPU.

        In orther words, the wide-warp-multiprocessor design of current GPUs is an overkill for this project. So, I think you are looking at it from a wrong perspective.
        Reminds me of Larabee. What makes you think, this team can be more successful than Intel?

        Comment


        • #44
          Originally posted by GruenSein View Post

          Reminds me of Larabee. What makes you think, this team can be more successful than Intel?
          Because it's not x86. x86 looks like a terrible idea for a GPU (complex decoders waste power).

          Comment


          • #45
            Originally posted by xfcemint View Post

            Because it's not x86. x86 looks like a terrible idea for a GPU (complex decoders waste power).
            I guess, we will see. Let's hope for the best. I will remain skeptical until I see some developer board comparable to a RPi.

            Comment


            • #46
              Originally posted by xfcemint View Post

              The simplest way to design a GPU is probably the CPU+vector ISA extensions path. So, from the perspective of simplifying the project, ISA extensions are not such a bad idea at all.
              we do need to be very careful. the reason for the hybrid architecture is to be able to cut out the "CPU-userspace-kernelspace-serialisation-PCIe-deserialisation-GPU-execution *and back again*" insanity.

              the Khronos Group is currently working on adding ray tracing to Vulkan, and when the presenter doing the XDC2020 talk said, "now you can call this API recursively", everyone on the IRC channel went "oink, did he really say that?"

              and the reason is because everyone there knows the full implications for the driver stack in a traditional split CPU-GPU architecture: they're going to have to create an RPC mechanism across that inter-processor bridge - one that can only be safely done if protected by the linux kernel - that can now do recursion for god's sake!

              think about that for a minute. the insanity of doing full serialisation-deserialisation of function call parameters from CPU userspace, jumping to Linux kernelspace and sending serialised function calls over to a GPU which unpacks them at the other end - just went recursive?? they're going to have to "mirror" the state of a stack! no wonder NVidia charges so much damn money for their GPUs!

              whereas with the hybrid architecture, we just... make the function call. ray-tracing is recursive? so what. it stays entirely in userspace. it's a *userspace* recursive function call and a *userspace* stack: it doesn't even go into a linux kernel context-switch because the Kazan Vulkan driver (and the MESA one) are *entirely in userspace*. the 3D GPU opcodes we're adding are called... *from userspace*. they're called from a shader binary that was compiled by the SPIR-V compiler inside the Vulkan driver... *but they're called from userspace*.

              this cuts man-years off the development cycle, makes end-user application development simpler and easier to debug, and much more. and is literally an order of magnitude simpler to implement.

              however - all of the ISA extension additions is predicated on "approval" from the OpenPOWER Foundation, through the development of a "yes you can isolate these custom extensions behind an escape sequence system" extension that *itself* has to be properly reviewed and ratified. absolutely nobody can simply drop a set of unauthorised custom modifications to the OpenPOWER ISA without also expecting to have an army of IBM lawyers drop a legal ton of bricks on their head.

              and that's why we're also going to the trouble of making sure that there is a justification for *other* OpenPOWER Foundation Members to use (and therefore support) the ISA extensions. adding IEEE754 sin, cos and atan2 to the scalar PowerISA can be viewed as useful in HPC environments, for example. so it's a long road ahead.

              Comment


              • #47
                Originally posted by xfcemint View Post
                I think the GPU core should be different from the CPU core. They can be very similar, they can both be based on POWER ISA, but they should be different.
                ahh... ahh... i like it! i don't think anyone's suggested that before. i don't know why. it's a natural extension of the big.little idea.

                Then you can cut on the number of transistors in the GPU core, make the core smaller, simpler and more efficient. Keep in only what is necessary for a GPU. Then you can keep the CPU core bigger, faster and more power hungry.
                yeah. no this is really exciting. i mean, originally (like, only 3 days ago) i was thinking, in big.little you could have the little cores with only say 8k Instruction-Cache and massively deep back-end SIMD ALUs (still with the Vector front-end though), but what hadn't occurred to me was to *drop* parts of the PowerISA on those cores which aren't strictly needed.

                i need to think about that. the reason is because there are currently only 4 "Platforms" in the OpenPOWER v3.1B Specification: AIX-compliant, UNIX-compliant, Embedded and Embedded-no-FPU. what you describe - which is a damn good idea - doesn't really fit any of those. i may have to raise this with the OpenPOWER Foundation, so thank you!

                Comment


                • #48
                  Originally posted by GruenSein View Post

                  I guess, we will see. Let's hope for the best. I will remain skeptical until I see some developer board comparable to a RPi.
                  I would rate it as success as soon as they have it on FPGA, even with just the CPU working, without any GPU extensions. So, when this FPGA can run something like DOSbox with Quake, that is a success.

                  Also, this CPU is designed with OoO scheduler? Wow, that's already freaking amazing! If it additionaly has some kind of GPU acceleration capabilities, by whatever means - that's super fantastic. If that existed and was open source, some company would just take it and etch it on silicon - if only to create a RPi competitor.

                  Comment


                  • #49
                    Originally posted by lkcl View Post

                    yeah. no this is really exciting. i mean, originally (like, only 3 days ago) i was thinking, in big.little you could have the little cores with only say 8k Instruction-Cache and massively deep back-end SIMD ALUs (still with the Vector front-end though), but what hadn't occurred to me was to *drop* parts of the PowerISA on those cores which aren't strictly needed.

                    i need to think about that. the reason is because there are currently only 4 "Platforms" in the OpenPOWER v3.1B Specification: AIX-compliant, UNIX-compliant, Embedded and Embedded-no-FPU. what you describe - which is a damn good idea - doesn't really fit any of those. i may have to raise this with the OpenPOWER Foundation, so thank you!
                    Glad to be of help.

                    Also, I can see one issue there, which is also a suggestion: I don't see any need for a complex OoO scheduler in the GPU cores. Even a superscalar issue is probably too much. I mean, an OoO scheduler will just waste transistors and power. So the best is probably to replace it with some simpler scheduler, which needs additional work to be designed.

                    Comment


                    • #50
                      Originally posted by xfcemint View Post

                      I would rate it as success as soon as they have it on FPGA, even with just the CPU working, without any GPU extensions. So, when this FPGA can run something like DOSbox with Quake, that is a success.
                      yeah i have the litex BIOS running in FPGA, including initialising the DDR3 DRAM, the only major thing left from running a linux OS is the MMU. at that point, it's Doom all the way


                      Also, this CPU is designed with OoO scheduler? Wow, that's already freaking amazing! If it additionaly has some kind of GPU acceleration capabilities, by whatever means - that's super fantastic. If that existed and was open source, some company would just take it and etch it on silicon - if only to create a RPi competitor.
                      ok so i have the _pieces_ in place - thanks to Mitch Alsup (6 months studying the 6600 architecture and his augmentations), so i've planned the Computation Units based around that. last year i had a prototype up and running including shadowing which is how you do precise exceptions and pull back anything that's issued after a branch-point. i'll need about 4-6 weeks clear doing nothing else to get that added in, and it's not time to do that, just yet.

                      in the meantime it's running a very VERY basic FSM using the "pieces" that are already designed, prepared and tested *in advance* to have the 6600 Dependency Matrices dropped in and connected to them.

                      Comment

                      Working...
                      X