Announcement

Collapse
No announcement yet.

There's A New Libre GPU Effort Building On RISC-V, Rust, LLVM & Vulkan

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • juanrga
    replied
    Originally posted by brent View Post
    I'm not quite sure what to think of this, after the big failure that EOMA68 has been. The basic idea doesn't sound that great either: GPUs are efficient because of substantial amount of fixed-function, special-purpose hardware to accelerate common tasks like rasterization, texture sampling, geometry processing and now even raytracing. An array of general-purpose CPUs won't cut it.
    Some people disagrees on the efficiency of that fixed-function hardware. The problem is that not all the GPU workloads require the same ammount of resources (e.g. some games require more texture functions, other require more shaders), so engineers have to overdimension the fixed-function parts to not bottleneck specific workloads, and most of the time a part of the fixed-function hardware isn't used.

    Leave a comment:


  • fuzz
    replied
    Originally posted by lkcl View Post
    Innovation is taking place... behind closed, secretive doors. an example in another industry: I have a friend who used to work for Seagate, he said that the state of knowledge within each of the HDD companies into electro-magnetism is TWENTY YEARS ahead of the outside world. twenty years! *all* of the HDD companies systematically and routinely reverse-engineer each others' products, performing deep atomic-level scanning. he considered it so unethical that he quit and went to work in academia.
    It's sad to say that I believe that story easily. Thanks for the insight.

    Leave a comment:


  • lkcl
    replied
    Originally posted by fuzz View Post

    I don't know much about this stuff, but what you're saying reminds me about some of the things Alan Kay claimed about the underestimated performance of SPARC. Makes it feel like the industry stagnated if not went backwards for the last 20-30 years.
    Number9 Graphics Card: gone. Matrox: gone. ATI: bought by AMD. the innovation is now only done by a handful of extremely large companies, with massive patent portfolios. only a libre project stands a chance as it would be political and corporate suicide for the incumbents to try anything.

    Innovation is taking place... behind closed, secretive doors. an example in another industry: I have a friend who used to work for Seagate, he said that the state of knowledge within each of the HDD companies into electro-magnetism is TWENTY YEARS ahead of the outside world. twenty years! *all* of the HDD companies systematically and routinely reverse-engineer each others' products, performing deep atomic-level scanning. he considered it so unethical that he quit and went to work in academia.

    Leave a comment:


  • fuzz
    replied
    Originally posted by lkcl View Post

    larrabee specifically avoided adding in the kinds of custom accelerated instructions that would make it a GPU, as the *specific* goal of that research effort was not to make yet another hardware-based GPU, it was to see *if* a software-based GPU would be successful. as a scientific experiment, it produced a result (which intel had to censor). as a parallel processing compute engine however its performance was ground-breaking.

    however the team that worked on larrabee weren't allowed to tell anyone how bad the performance was: it was not until jeff bush replicated that work in nyuzi and made the full design source code public that it was possible to determine *EXACTLY* where the performance was lacking.

    i've spent a lot of time talking with jeff (he's really an amazing guy), and he pointed out things to me such as, if nyuzi / larrabee had a single instruction for converting 4-wide F.P. vectors of ARGB into 4 32-bit pixels, for example, that would knock something like... i can't remember exactly... let's say it would knock 20% off the time spent per pixel on rendering. then, the next highest priority to target would be... X (whatever).

    basically his paper lays out the groundwork on how to go about profiling a software-rendered design (which is a LOT easier than profiling a hybrid hardware-software design), giving you the statistics needed to decide where to focus time and effort.

    and, as the design is based on RISC-V and there are software emulators for that (qemu and spike), the process of doing iteratve development to add *in* the kinds of experimental custom instructions, to see what would and would not work, can be much more rapid than would otherwise be expected.

    bottom line: we're aware of larrabee, and nyuzi, and have a strategy in place *thanks* to that work.
    I don't know much about this stuff, but what you're saying reminds me about some of the things Alan Kay claimed about the underestimated performance of SPARC. Makes it feel like the industry stagnated if not went backwards for the last 20-30 years.

    Leave a comment:


  • lkcl
    replied
    Originally posted by davidbepo View Post
    huh? a cpu based gpu... larrabee comes to mind and that was an utter failure
    larrabee specifically avoided adding in the kinds of custom accelerated instructions that would make it a GPU, as the *specific* goal of that research effort was not to make yet another hardware-based GPU, it was to see *if* a software-based GPU would be successful. as a scientific experiment, it produced a result (which intel had to censor). as a parallel processing compute engine however its performance was ground-breaking.

    however the team that worked on larrabee weren't allowed to tell anyone how bad the performance was: it was not until jeff bush replicated that work in nyuzi and made the full design source code public that it was possible to determine *EXACTLY* where the performance was lacking.

    i've spent a lot of time talking with jeff (he's really an amazing guy), and he pointed out things to me such as, if nyuzi / larrabee had a single instruction for converting 4-wide F.P. vectors of ARGB into 4 32-bit pixels, for example, that would knock something like... i can't remember exactly... let's say it would knock 20% off the time spent per pixel on rendering. then, the next highest priority to target would be... X (whatever).

    basically his paper lays out the groundwork on how to go about profiling a software-rendered design (which is a LOT easier than profiling a hybrid hardware-software design), giving you the statistics needed to decide where to focus time and effort.

    and, as the design is based on RISC-V and there are software emulators for that (qemu and spike), the process of doing iteratve development to add *in* the kinds of experimental custom instructions, to see what would and would not work, can be much more rapid than would otherwise be expected.

    bottom line: we're aware of larrabee, and nyuzi, and have a strategy in place *thanks* to that work.

    Leave a comment:


  • lkcl
    replied
    Originally posted by -MacNuke- View Post
    I thought of this a while ago. I don't think that we will see any "widely-used" open-source GPU in the near future. There is just no man-power to create and maintain such things. Most projects start somewhere but fail along the line.
    i think one of the main reasons for that is they go for the "open claim" without actually being truly open. and, they bite off far more than they can chew. both the GPLGPU and the Open 3D targetted *graphics cards*, not a 3D engine.

    in addition, the typical 3D design actually involves inter-process communication (remote procedure calls!) between CPU and GPU. that's a HELL of a lot of work. EVERY single data structure, every single function call, of which there are dozens if not hundreds, has to be packed up on the CPU, the GPU notified where the packed data is, the GPU unpacks it, does work, packs the response (if any) back up, notifies the CPU, and the CPU unpacks the answer and FINALLY the function is done.

    i understand *why* it's done that way - it's just completely insane. so we're doing a hybrid CPU-GPU approach: the CPU *is* the GPU, and that's why we chose to do a software-based renderer (and then write custom RISC-V instructions and associated assembly routines, where needed)

    also, with kazan being a software renderer, it's effectively a Reference Implementation just like Mesa3D is a Reference Implementation of OpenGL. so, i've reached out to the Khronos Group to see if they'd like to sponsor the project.

    the general idea, therefore, is to provide *useful* milestones along the way, tackling things piece-by-piece, and inviting people to help out, which they can do because it's *genuinely* an actual open project, not a "we'll release it when it's finished" project. the code's available for review *right now*.
    [/quote]


    I think there are just 2 options:
    1. like this project but not as a separate chip: just make a display output (i.e. DisplayPort) and run everything else on the CPU via LLVMpipe and vulkan-cpu. Sure the CPU needs many cores that way.
    just an aside: LLVMpipe has design limitations: the shader is not accelerated and there's a critical single-threaded design limitation. this is why jacob started kazan, a Vulkan3D driver.

    yes, the idea is to beef up *standard* SMP-based RISC-V cores to the point where there's effectively absolutely no difference between the CPU and the GPU. for general-purpose execution, a program would simply not use the custom 3D instructions. with the increased FP performance and the increased memory bandwidth required to support 3D, i would expect significant performance increases for general-purpose computing as well.

    and all of this done publicly so that people can help contribute if they want to.

    Leave a comment:


  • oiaohm
    replied
    Originally posted by Weasel View Post
    oiaohm I'm saying that just because it was prototyped in the lab and projected for 20 years into the future doesn't mean it will become fact. I'm not sure how much simpler I can make it. Everything you said could become true, or not, like a lot of long projections in the past (see above, don't want to repeat them again...). Your problem is that you believe that everything in the lab or "theoretically possible" will become fact when clearly history isn't on your side at all.
    The problem here is 100 MB/s limit of harddrives without question is going to be exceeded.

    Looking into the future for long term planing requires looking at what is theoretically possible.

    In business you only have to look what happened to IBM to wake up that large company to relocate themselves in a market takes 10 to 15 years. So business planning by large companies have to be looking at 20 year forecasts even if they are a little questionable.

    Current MAMR says that these drives will need microcontrollers as fast or faster than current day cpus to extract best performance.

    Originally posted by Weasel View Post
    Worse is when they just do projections/extrapolations and you just eat it like it's fact and it is guaranteed to succeed.
    Really what I am telling is what WD management will be using to forecast what they will need in the next 20 years. Correct not all R&D is guaranteed to be success.

    When WD is needing in next 20 years micro-controllers as faster as current x86 chips and cost effective. This is not going to come from x86 and unlikely to come from arm. Yes to allow for some R&D failures WD has to start now. Of course how is WD going to recover the cost of these developments.

    Really please note the long term projection you pointed to had a flaw it was presumed green lasers would come simple to produce and reliable. MAMR is already more reliable than prior drive technologies.

    When you do 20 year forcasts based on technology that works fully in LAB they are different to other projections. Adding more arms in drives have to happen in stages to get field exposure. 4 arm drives work in lab already. What is required is the field testing to confirm they are strong enough in real world production handling. WD is starting with 2 arm drives that then will be followed by 3 arm drives and finally 4 arm drives the 2 arm drives alone see HDD exceeding 100MB/s.

    Leave a comment:


  • Weasel
    replied
    oiaohm I'm saying that just because it was prototyped in the lab and projected for 20 years into the future doesn't mean it will become fact. I'm not sure how much simpler I can make it. Everything you said could become true, or not, like a lot of long projections in the past (see above, don't want to repeat them again...). Your problem is that you believe that everything in the lab or "theoretically possible" will become fact when clearly history isn't on your side at all.

    Worse is when they just do projections/extrapolations and you just eat it like it's fact and it is guaranteed to succeed.

    Leave a comment:


  • oiaohm
    replied
    Originally posted by Weasel View Post
    Remember the Holographic Versatile Disc? That was a prototype too.
    I remember them they could not be produced dependable way in R&D either.
    https://www.wsj.com/articles/SB120285999714463727
    Problem is the green laser. Yes Holographic Versatile Discs can be made in Lab dependably that work if you can consume 3 5 1/4 drive bays.

    MAMR this is stuff that currently can be made in lab that fits in current 3 1/2 standard hard-drive bays with standard size disc platters using standard platter chemistry and head control systems.

    The biggest barrier is the controller silicon. 2.3ghz switching(read/write speed) this means you need about a 4 ghz clock speed controller for the drive. But there are other ways. Like reading from multi heads and multiplexing to increase speed as well. Without the competition from SSD the HDD makers could have avoided having 2 -4 independent arms inside the HDD controlling heads and kept on using single arm.

    So there are two reason why HDD speed is going to increase. One is MAMR the other is multi arms inside drive. The multi arms with current drive technology equals up to 4 times faster drive without changing much. It is a funny one everything in computer world is kind of going serial and here is harddrives internally going parallel.

    This is the problem you have missed for the next generation hard drives WD need high performance silicon. If they are making high performance silicon for HDD making accelerators and other things as well is just making good usage of their silicon development investment.

    Reality here is the 100MB/s limit of current harddrives is going to be smashed both MAMR and multi arms smash that limit if both are able to be used with each other effectively we will see the 800MB/s drives. We are at least going to see 400MB/s HDD at some point even if all the technology currently developed is not deploy-able..

    By the way Weasel WD is expecting the first double arm drives to customers in just the next few years. Quad arm is just double arm technology double stacked.

    Leave a comment:


  • Weasel
    replied
    Originally posted by oiaohm View Post
    https://ieeexplore.ieee.org/document/7471502
    2.3 ghz switching in next generation MAMR drives that gives you 200MB/s per head. The introduction multi read and write heads doubles to quads that. 800MB/s per drive without raid or anything else. That is current prototype technology. Our current day drives are slow. These speed increases means you could raid 10TB current day HDD into 1 100TB drive and be slower than the future 100TB MAMR drives. So this is not just a increase in storage density this is a increase in speed. We have not seen HDD increase in speed a lot Of course a MAMR drive is not going to be as fast as a SSD as a single drive but raided HDD this comes a different matter.
    Remember the Holographic Versatile Disc? That was a prototype too.

    I get the feeling you really haven't been through much of this before since your arguments are always something like "but in the lab..."

    Of course even a broken clock is right twice a day, but you get my drift...

    Leave a comment:

Working...
X