Announcement

Collapse
No announcement yet.

RADV Exploring "A Driver On The GPU" In Moving More Vulkan Tasks To The GPU

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Venemo View Post

    Most current applications generate a command buffer on the CPU, and then ask the GPU to execute that. A command buffer is basically a list of draws, copies, dispatches etc. which is executed by the GPU's command processor.

    This new feature Bas is working on, makes it possible for applications to generate a command buffer on the GPU, and then executing that on the GPU without CPU intervention.

    The main difficulty here is that these command buffers are more difficult to debug and analyze.

    I see, thanks for clarifying.

    Originally posted by Venemo View Post
    It's very different.
    Yeah, I don't mean GPU are generic processors, they are specialized for a reason. But I mean may be some sort of compilation part can fit such kind of vectorized architecture? I'm Just guessing, I don't expect like a whole compiler to run there even if it's Turing complete
    Last edited by shmerl; 26 April 2022, 02:28 AM.

    Comment


    • #12
      Originally posted by shmerl View Post
      But I mean may be some sort of compilation part can fit such kind of vectorized architecture? I'm Just guessing, I don't expect like a whole compiler to run there even if it's Turing complete
      No, not really. Compilation is largely serial with a huge number of branches. Yes, modern GPU's are Turing complete so it's theoretically possible to do it, it's just the worst possible workload for a GPU so it would be very slow.

      Comment


      • #13
        Originally posted by jorgepl View Post

        I would love to know more about this topic, and also the pros and cons of implicit vs explicit sync model. AFAIK it's not a limit sync model, but a Wayland sync model, but that's as far as I could read on the internet. I can't find more info. Do you know where could I read more about this?
        https://lwn.net/Articles/814587/

        I don't think it's really a Wayland issue, per se.

        More like that's how the linux kernel DRM graphics work, because that's how OpenGL and the entire existing open source graphics stack works. So Wayland got built around that too, because it was designed around OpenGL and already a large change to get everyone to adopt - see how that still hasn't happened yet.

        Meanwhile Vulkan has come along with the new explicit model, and now everyone wants to switch over to that, but it's a big task to try and get everything switched over. If you've got a centralized development, like, say, Microsoft or Apple, you can have everyone work to get everything ported over all at once. But in OSS it's like herding cats to get something like that accomplished. Again, see Wayland adoption.

        Comment


        • #14
          Originally posted by jorgepl View Post

          I would love to know more about this topic, and also the pros and cons of implicit vs explicit sync model. AFAIK it's not a limit sync model, but a Wayland sync model, but that's as far as I could read on the internet. I can't find more info. Do you know where could I read more about this?
          As already mentioned, this is not really about implicit vs explicit sync, but anyway, since you asked (with a disclaimer I'm not at all a low level graphics hacker, and while I have some OpenGL experience I'm not a professional in that area either):

          So if you think about how stuff work in modern OpenGL, you create buffers containing things like vertex coordinates, vertex indices, normals, textures and whatnot that make up a scene. And then you have things called shaders, small programs that work on per-vertex and per-fragment (~pixel) that the driver compiles. And then you upload all these buffers and compiled shaders to the GPU, and tell the GPU to start rendering.

          For this kind of simple usage implicit sync works fine. Implicit here meaning that the order of events is whatever the order that the host CPU submitted them to the GPU. This is easier for the programmer, as there is not need to do some, well, explicit synchronization, and it works fine as long as you have a single CPU core interacting with the GPU, and all the flow in the rendering pipeline is FIFO (that is, host prepares work, uploads to the GPU which renders it and outputs to the screen, no need for any kind of back-and-forth).

          But what if you have, say, multiple CPU's that you want to use for creating these buffers and sending commands to the GPU (this is a huge pain or outright impossible in OpenGL, but it's explicitly the modern world which Vulkan was designed for)? Or you have buffers that you want the GPU to do some work on, then you want to download some result of this work to the host (while doing other GPU interaction while waiting for this work to complete), do some work with it and based on that submit more work to the GPU? Or you have multiple GPU's? This is where the implicit sync method starts to break down, as the point of having many CPU cores interacting with the GPU is that you want them to run along independently as much as possible and not be tightly synced with each other.

          So this is where explicit sync comes in, which is a model where everything by default runs independently, and where you want to enforce some ordering (say, tell the GPU to not start rendering a frame before all the buffers and other needed inputs are ready) you have to use explicit synchronization constructs.

          In Mesa-land there seems to be a consensus that explicit sync is the future (most other platforms have already moved to such a model), the debate is about
          • How to move to an explicit sync model without breaking all the existing code that assumes implicit sync.
          • What can be done in userspace, and what must be done in kernel space.
          Some starting posts of threads on dri-devel about this (not exhaustive):

          Comment


          • #15
            How does this "Driver on the GPU" relate to Hardware Accelerated GPU Scheduling on Windows? And there has been a discussion on Windows that many newer AMD GPUs have a Hardware Scheduler, which Nvidia got rid of in their newer GPUs, and that means Nvidia has higher CPU overhead than AMD in DX12 for this reason? Here is a video about it that Hardware Unboxed made, and I was always wondering if this hardware scheduler in AMD GPUs has been used at all.

            Comment


            • #16
              Originally posted by JacekJagosz View Post
              How does this "Driver on the GPU" relate to Hardware Accelerated GPU Scheduling on Windows? And there has been a discussion on Windows that many newer AMD GPUs have a Hardware Scheduler, which Nvidia got rid of in their newer GPUs, and that means Nvidia has higher CPU overhead than AMD in DX12 for this reason? Here is a video about it that Hardware Unboxed made, and I was always wondering if this hardware scheduler in AMD GPUs has been used at all.
              They are not related. The hardware scheduler on the GPU is a replacement for the GPU scheduler in the kernel driver (e.g., the software that chooses which jobs get sent to the hardware). The scheduler is mainly there to enable user mode queues which sort of require explicit sync because there is no central place where synchronization can happen (i.e., the kernel driver). ROCm, for example, uses the hardware scheduler on the GPU to expose user mode queues. Applications can allocate user mode queues and submit work to them directly without going through the kernel driver. The hardware scheduler then decides which queues execute on the hardware if there are more queues than hardware slots. The hardware scheduler is not concerned with who (CPU or GPU) actually builds the commands which are submitted to the queues.

              Comment


              • #17
                agd5f Thank you for such a detailed anwser! So I guess when a hardware scheduler is missing (like in APUs which cut it to save die space), it is falling back to the scheduler in the kernel driver?

                Comment


                • #18
                  Originally posted by JacekJagosz View Post
                  agd5f Thank you for such a detailed anwser! So I guess when a hardware scheduler is missing (like in APUs which cut it to save die space), it is falling back to the scheduler in the kernel driver?
                  I'm not that familiar with AMD GPU's so I'm just speculating here, but I'd say it's not a "hardware scheduler" in the sense that the scheduler would be implemented in hardcoded ASIC logic. There is probably a tiny management CPU (say, an ARM core) on the GPU that handles all kinds of management tasks, including running the "hardware scheduler". And the scheduler code is thus part of the GPU firmware.

                  Further, I'd speculate that lower end GPU's cut out execution units, not the tiny management core since that one is needed anyway for all kinds of stuff. That management core is probably so tiny that it's not worth the effort to have several different management cores depending on how beefy the GPU is. So whether the hardware scheduler is missing or not is probably more of a generation thing than something dependent on the beefyness of the GPU.

                  Comment


                  • #19
                    Originally posted by jabl View Post

                    I'm not that familiar with AMD GPU's so I'm just speculating here, but I'd say it's not a "hardware scheduler" in the sense that the scheduler would be implemented in hardcoded ASIC logic. There is probably a tiny management CPU (say, an ARM core) on the GPU that handles all kinds of management tasks, including running the "hardware scheduler". And the scheduler code is thus part of the GPU firmware.

                    Further, I'd speculate that lower end GPU's cut out execution units, not the tiny management core since that one is needed anyway for all kinds of stuff. That management core is probably so tiny that it's not worth the effort to have several different management cores depending on how beefy the GPU is. So whether the hardware scheduler is missing or not is probably more of a generation thing than something dependent on the beefyness of the GPU.
                    You are thinking correct, all the gpus from the same family either have or don't have the scheduler. So all Vega desktop cards have the scheduler, then all APUs that have Vega GPU in it don't, and then all desktop cards that suceeded Vega have it. So I am wondering what part of the code takes on the role of that hardware scheduler when it is missing from the cars, so on either Vega APUs or from Polaris? The microcode?

                    Comment


                    • #20
                      Originally posted by JacekJagosz View Post

                      You are thinking correct, all the gpus from the same family either have or don't have the scheduler. So all Vega desktop cards have the scheduler, then all APUs that have Vega GPU in it don't, and then all desktop cards that suceeded Vega have it. So I am wondering what part of the code takes on the role of that hardware scheduler when it is missing from the cars, so on either Vega APUs or from Polaris? The microcode?
                      The hardware scheduler is a microcontroller on the GPU that runs firmware. It's available on all dGPUs and APUs since Sea Islands (GFX7) for compute and SDMA and navi (GFX10) for gfx. It's only used by ROCm at the moment mainly due to the complications around implicit synchronization when using user mode queues. There is not much reason to use it without user mode queues.

                      Comment

                      Working...
                      X