Announcement

Collapse
No announcement yet.

Libre-SOC Still Persevering To Be A Hybrid CPU/GPU That's 100% Open-Source

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by lkcl View Post
    25 years ago i got such bad RSI (known as carpal tunnel in the U.S.) that i had to minimise typing. it got so bad that one day i couldn't get into my house because i couldn't turn the key in the lock.

    like the "pavlov dog", if it actually physically hurts to stretch your fingers just to reach a shift key, pretty soon you stop doing it. however when it comes to proper nouns, sometimes i find that the respect that i have for such words "over-rides" the physical pain that it causes me to type the word.
    At the end of last year I got a new keyboard, started looking into alternate keyboard layouts since it has removable keys, and, for practicality reasons, not disability, I started wondering why the space bar is so big and why we don't have an extra layer of function/modifier keys below the space bar and/or shorten up the space bar and add some there since my space bar has the area to hold 5 control keys, 2.5 left shifts, or 7 letter keys. I think a product like that could help someone having issues with horizontal movement to the outside of the keyboard by placing some of those keys where most thumbs naturally rest.

    A more vertically oriented keyboard layout. Such a form factor would also be useful on laptops, keyboard gaming, and when using lots of keyboard shortcuts since a thumb resting on or next to Ctrl or Alt would be easier and faster than taking your hands off the home row. I have a hard time using the right hand modifier keys.

    I've been doing a lot of jack hammering over the past few years and it's starting to have a negative effect on my wrists so this is a subject that I've been thinking about more and more lately.

    Comment


    • #32
      Originally posted by OneTimeShot View Post

      Yeah - I think I found it in soc.git/src/soc/... I can confirm that there is approximately that quantity of Python that autogenerates stuff, a fair proportion of which is original code. Some of the file names match modules that would appear in a CPU.

      I assume that it will do something, but it is hard to tell what. I couldn't find anything related to DisplayPort or HDMI, or a memory management system, or parallel processing, or instruction queues, or pipeline timers, or task schedulers, or cache retrieval, or vertex processing, or texture compression, or fanout, or Z buffer, or depth processing, or anything else I'd expect to see if I went to a GPU project.

      Does it connect to a monitor and display a triangle?
      haang on, hang on - we're a loong way from these things - we've got several hoops to get to before then, not least is: because this is a hybrid processor we need the base ("scalar") part up and running first, and that's the primary focus right now. let me go through of the things you're expecting:

      * displayport, hdmi, rgb/ttl - these are all "peripherals". they can be handled with e.g. litex and you wouldn't expect peripheral source code to be dropped into a "core" (processor) git repository. we'll use Richard Herveille's RGBTTL HDL for example, and there's no point copying that into a *processor* repo, is there? just link to it with git submodules.

      * MMU - you missed it - https://git.libre-soc.org/?p=soc.git...mmu.py;hb=HEAD

      * parallel processing - this is handled by the augmented 6600 scoreboard system (which i've developed but not hooked up, not enough time) and by allocating "QTY more than one" of a given Function Unit. here you can see that the defaults are "QTY 1" - it should be clear that the code's designed to increase those according to a python dictionary HDL compile-time config variable https://git.libre-soc.org/?p=soc.git...254bfb273#l245

      * pipeline timers - no, we're going with something that i term "Phase-Aware Function Units" which in 6600 terminology, following Thornton's book "Design of a Computer" (fascinating read) is called "Computation Units". they're self-monitoring ("aware of their phase i.e. aware of both the start time and the end time"). the python base class for ALUs is this: https://git.libre-soc.org/?p=soc.git...lti.py;hb=HEAD and the code for monitoring operands and making damn sure they're associated 100% absolutely guaranteed with their results is here https://git.libre-soc.org/?p=nmutil....7b3507606d#l86

      * task schedulers - err i'm not sure what you mean, or what you might be expecting. this is a hybrid CPU / VPU / GPU, so "task scheduling" is handled by the standard linux kernel.

      * cache retrieval - we haven't got to yet, we're still porting over microwatt dcache.vhdl and icache.vhdl, which has had to be shelved for now because it's far too big a task to get into the October deadline.

      * vertex processing etc. these we will get to *after* the October deadline, however we need to design the augmentation to PowerISA i.e. actually *design* Texture opcodes looong before we ever try to put them into actual HDL.

      the actual order is:

      * design opcode (e.g. texture opcode, or RGB2YUV opcode, or Z-buffer opcode)
      * write simulator implementing opcode
      * hand-code binary instructions
      * test on simulator
      * modify binutils to support new opcode
      * write assembler instead of using hand-coded binary instructions
      * test on simulator again
      * write some test programs (mostly in assembler)
      * test on simulator again and analyse performance metrics
      * repeat until pixels / clock rate is high enough...

      ... oh and *then* actually put it into HDL (and run the tests agaaaaain)

      in parallel with that, collaborating with the MESA and Kazan3D driver developers, we use feedback from them (and anyone else willing to pitch in and help us avoid silly mistakes) to drive that and get the best ISA with the simplest driver design - all at the same time.

      this iterative development cycle was what we put in the applications for funding in to NLnet for and they were happy with it.

      bottom line is, a *hybrid* CPU-GPU-VPU is basically "A CPU wot got some GPU-like opcodes". you just don't have the kinds of things that you'd get in a SIMT-based specialised dedicated GPU. so we'll add SIN, COS, ATAN2, we'll add a SIMD YUV2RGB opcode, and so on. and also "tagged" predication and vectorisation on top of the scalar PowerISA, through SimpleV. which is covered in the latter parts of the XDC2020 talk. which starts... let me find it... here https://www.youtube.com/watch?v=FxFPFsT1wDw&t=11935s

      hope that helps.

      Comment


      • #33
        Originally posted by TJSER View Post
        This again demonstrates
        i'm really sorry, tsjer, there's too many assumptions and incorrect assertions that you've made which, if i were to begin to correct them all, it would make both of us look foolish and belligerent. do you have any positive questions that you would like to ask that will be informative and enjoyable for people wishing to learn about processor development?

        Comment


        • #34
          Originally posted by skeevy420 View Post
          I've been doing a lot of jack hammering over the past few years and it's starting to have a negative effect on my wrists so this is a subject that I've been thinking about more and more lately.
          yeah i had the privilege to meet julia longtin at FOSDEM 2020 and she explained that a few years ago the whole reason she started developing the 3D aluminium casting was because she was forced to take 6 months medical leave due to RSI. she used the time to research keyboards and found this weird curved "dish" keyboard and set it up with a dvorak layout. i apologise i can't find a post about what she actually bought, but if things get really *really* bad for you there are options out there and you can at least keep your career and continue doing what you enjoy.

          due to resurgence of RSI, i got a cheap split keyboard with 2 separate spacebars (one left, one right), just 2 days ago, still getting used to it. mainly, y'know, i get the feeling that we get stressed and therefore tense, and as a profession just focus too much on the screen, y'know? ignore the niggling pain, can't be that bad, right? and of course it cuts off the bloodflow, you get cold, and cramp up, and... yeah. so i think there's much more to it than just "getting a new keyboard", you have to look at how the *use* of the computer is affecting your thoughts (causing stress and therefore physical tension), and so on. that's just my take on it.

          Comment


          • #35
            Originally posted by lkcl View Post

            haang on, hang on - we're a loong way from these things - we've got several hoops to get to before then, not least is: because this is a hybrid processor we need the base ("scalar") part up and running first, and that's the primary focus right now. let me go through of the things you're expecting:

            bottom line is, a *hybrid* CPU-GPU-VPU is basically "A CPU wot got some GPU-like opcodes". you just don't have the kinds of things that you'd get in a SIMT-based specialised dedicated GPU. so we'll add SIN, COS, ATAN2, we'll add a SIMD YUV2RGB opcode, and so on. and also "tagged" predication and vectorisation on top of the scalar PowerISA, through SimpleV. which is covered in the latter parts of the XDC2020 talk. which starts... let me find it... here https://www.youtube.com/watch?v=FxFPFsT1wDw&t=11935s

            hope that helps.
            Yes I can see all that. All of the things you are not doing are critical in building a GPU. What you are doing is building a CPU with a custom vector extension because you don't like AVX-512. We know in advance that software emulation GPU performance and power usage is going to be terrible. A general purpose CPU core has too many transistors to replace a specialized GPU core.

            At the end of the day, the world doesn't need another CPU with vector extensions. Those already exist, and we already have the performance SIMD provides to CPU graphics work when running software Mesa. If you want to build a GPU, here is literally the first thing that came up when searching for GPU designs on open cores: https://opencores.org/projects/flexgripplus

            It looks like it comes from the University of Massachusetts (sorry it's written in "hardcoded non-OO" Verilog) and it has all the bits you'd expect to need in a GPU (it looks like it's more compute than graphics oriented):
            - SMP Controllers
            - Pipeline execution
            - Customised maths libraries
            - Execution Shedulers
            - RAM management

            At the end of the day, have fun with whatever you're doing I guess. Just don't promise anyone anything you can't deliver, and don't bother real hardware developers too much because until you have built the things listed above, or you have extensive game engine knowledge, you can't really offer much experience.

            Comment


            • #36
              Originally posted by lkcl View Post

              gruensein, we're doing "incremental development", going step by step, and one of the first steps, clearly, has to be to actually get standard scalar PowerISA under our belt, first. that's been done successfully "on par" with microwatt, and i have it running on a Versa ECP5 FPGA, a couple of weeks ago:

              https://www.youtube.com/channel/UCNV...s0_Sg/featured

              however that's only a 45k LUTs FPGA, which is going to be nowhere near big enough to fit an SMP system into, nor a dual SIMD IEEE754 FP32 pipeline. realistically we'll need at least a 200k LUTs FPGA and even then that's going to be tight.

              we'll get there - and there'll be announcements about each milestone that we reach, when we do. there will be plenty more articles i'm sure
              Well, great. I'd love to see a libre and available SoC with decent performance and software support. Given the history of libre GPUs, open hardware in general, and the enormous complexities of SoCs and their manufacturing, I remain quite skeptical.

              Comment


              • #37
                Originally posted by OneTimeShot View Post

                Yes I can see all that. All of the things you are not doing are critical in building a GPU. What you are doing is building a CPU with a custom vector extension because you don't like AVX-512.
                well, nobody does we're doing PowerISA and yes i don't like VSX. ok that's not true: i don't like SIMD https://www.sigarch.org/simd-instruc...dered-harmful/

                basically, like i said in the (two) talks, we're doing a hybrid architecture - shared CPU and GPU (and VPU) workloads. that means that context-switching in shader applications, if the Vector portion of the ISA is horribly inefficient (VSX strncpy is a whopping *250* hand-coded assembly instructions, where RVV/SV it is *14*), it strips the L1 cache.

                what should be an 8k GPU shader binary occupies the entire 32k cache and that's not in the slightest bit acceptable... *for a hybrid* processor but it's perfectly ok for a *dedicated* GPU.

                so we have these very unusual combined unique requirements which neither a dedicated CPU nor a dedicated GPU would face.

                We know in advance that software emulation GPU performance and power usage is going to be terrible.
                well, Larrabee (and Nyuzi) showed that if you don't put the effort into providing the custom hardware at the bottleneck points, you end up with a great Vector Compute Engine that misses modern GPU performance/watt by a whopping 4:1 margin. yes of course a *CPU* would miss by a huge margin (20:1, 100:1) but it often surprises people to find that even a fantastic Vector Engine is still only 25% of modern commercial GPU performance/watt.

                Jeff Bush clearly demonstrated this with ChiselGPU which is a hardware triangle render experiment. he wrote it not for "actual" use but as a comparative micro-benchmark for his research, and it's been a while since i read his nyuzipass2015 paper i think he got SEVEN times better performance out of ChiselGPU for hardware-tile-rendering of triangles than in Nyuzi. https://github.com/jbush001/ChiselGPU



                A general purpose CPU core has too many transistors to replace a specialized GPU core.

                At the end of the day, the world doesn't need another CPU with vector extensions. Those already exist, and we already have the performance SIMD provides to CPU graphics work when running software Mesa. If you want to build a GPU, here is literally the first thing that came up when searching for GPU designs on open cores: https://opencores.org/projects/flexgripplus
                that's fantastic! i've been so busy i missed this one, it looks like they dropped it onto opencores back in april of this year. 28 instructions... ah yes, that's definitely a GPU not a hybrid CPU-VPU-GPU aimed at both simultaneous general purpose UNIX workloads and GPU shader programs, context-switching regularly between the two simply using standard linux kernel scheduling.


                It looks like it comes from the University of Massachusetts (sorry it's written in "hardcoded non-OO" Verilog) and it has all the bits you'd expect to need in a GPU (it looks like it's more compute than graphics oriented):
                yeahhh that's what MIAOW did, as well. they created a great vector compute engine based on a subset of AMD GPU opcodes. it works really well... except you can't possibly hope to tape it out and expect it to meet modern commercial needs because MIAOW is a *part* of a GPU, not a full GPU.

                - SMP Controllers
                - Pipeline execution
                - Customised maths libraries
                - Execution Shedulers
                - RAM management

                At the end of the day, have fun with whatever you're doing I guess. Just don't promise anyone anything you can't deliver, and don't bother real hardware developers too much because until you have built the things listed above, or you have extensive game engine knowledge, you can't really offer much experience.
                it's a collaborative opportunity. when i start these ambitious projects, i don't expect - at all - to be the one that "does it all". i present the *opportunity* for others to come forward, and say, "hey if you want this to succeed you're going to need Feature X" and far from going "yeah we don't need your help" i'm delighted that they took the time to do that.

                and like, this actually happened on the XDC2020 IRC channel only 2 days ago! one of the talks was about etnaviv perf counters, and the people on the channel went, "you don't have perf counters?" and i asked, "err do i need them" and 3 people simultaneously replied "yeah you need perf counters" so, bugreport raised, it goes on the list, and i'm really grateful to them for letting me know.

                *that's* why we're doing this project in an open fashion with no NDAs.

                so i mention this because it turns out that in the right environment and at the right time, people do very much want to help out. thank you OneTimeShot.

                Comment


                • #38
                  Originally posted by xfcemint View Post

                  Forgot to answer to this issue.

                  Yes, a world doesn't need another CPU with vector extensions, but it needs an open-source CPU and an open-source GPU. And, an open-source GPU can be built from many CPU cores with vector extensions. Consequently, that's why a CPU with vector extensions is needed - to make a GPU, not to make a CPU.
                  ah, parallella / adapteva, and i was really hoping their approach would take off, showed that the "sea of little cores" approach is not necessarily going to fly. my feeling there is that it couldn't run standard applications (it wasn't a SMP NOC because the little cores were "embedded" platform RV32, not "UNIX" platform compliant RV64GC, to keep them small enough to make a sea of them). they also didn't add any kind of SIMD or Vector processing. unfortunate.

                  Comment


                  • #39
                    Originally posted by xfcemint View Post

                    You are likely wrong.

                    The simplest way to design a GPU is probably the CPU+vector ISA extensions path. So, from the perspective of simplifying the project, ISA extensions are not such a bad idea at all.

                    The difference is that, when you make a GPU in that manner (from CPUs with vector ISA), then you create it out of small, low-power cores, but lots and lots of them. It wouldn't be a super-high performance GPU, but it would be able to do the job of an integrated GPU. That is already sufficient for this project to be sucessfull. This is not comparable to any current design, so you are wrong when you say "a GPU requires that, that, and that", because no, this doesn't look to be that kind of a GPU.

                    In orther words, the wide-warp-multiprocessor design of current GPUs is an overkill for this project. So, I think you are looking at it from a wrong perspective.
                    Reminds me of Larabee. What makes you think, this team can be more successful than Intel?

                    Comment


                    • #40
                      Originally posted by xfcemint View Post

                      Because it's not x86. x86 looks like a terrible idea for a GPU (complex decoders waste power).
                      I guess, we will see. Let's hope for the best. I will remain skeptical until I see some developer board comparable to a RPi.

                      Comment

                      Working...
                      X