Announcement

Collapse
No announcement yet.

Experimental Zero-Copy Support For Nouveau With GNOME Mutter

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Experimental Zero-Copy Support For Nouveau With GNOME Mutter

    Phoronix: Experimental Zero-Copy Support For Nouveau With GNOME Mutter

    Ubuntu desktop developer Daniel Van Vugt has been working on enabling zero-copy support for discrete GPUs within GNOME's Mutter compositor to deliver faster performance. This appears to be working so far with the Nouveau open-source NVIDIA driver...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    The existence of this article is going to leave to unnecessary drama when this proposal gets shot down due to technical reasons.

    Comment


    • #3
      I still don't understand how zero-copy isn't possible.

      Is it due to hardware or software limitation? I fail to see how modern hardware like these can't do this.

      What about AMD hardware and this?
      agd5f
      bridgman

      Comment


      • #4
        It's irony that Nouveau gets targeted for such a performance feature. At least as long as it isn't a really usable nor desireable to use it, hence the developer focus should be on the hardware people actually use to make a meaningful impact.

        Comment


        • #5
          Originally posted by timofonic View Post
          I still don't understand how zero-copy isn't possible.

          Is it due to hardware or software limitation? I fail to see how modern hardware like these can't do this.

          What about AMD hardware and this?
          agd5f
          bridgman
          It's possible on dGPUs, but the shared buffer needs to end up in VRAM. The display hardware is a real-time client so it's very sensitive to latency. For dGPUs to access system memory, the display hardware would have to go across the PCIe bus and walk GPU and potentially IOMMU page tables. This would cause too much latency for memory access and likely lead to underflow in the hardware. On APUs, the iGPU has a fast path to memory ("VRAM" is just stolen system memory on APUs) and additional translation caches, so they can support display buffers in system memory. I'm surprised this works on Nouveau. Perhaps this is a bug in nouveau that allows this by accident, but you may end up seeing problems (flickering, blackouts, etc.) when there is PCIe bandwidth contention, limited link speeds, etc.

          To support zero copy on dGPUs, you would need to allocate the display buffer on display dGPU (in VRAM) and then import it to the rendering GPU. The rendering GPU could then render directly to the display dGPU buffer over the PCIe bus. That said, I'm not sure zero copy really makes sense. GPUs are are only fast when they are rendering to local RAM (VRAM on dGPUs, system memory on APUs) where they have a lot of memory bandwidth. If you have 2 dGPUs, the best performance would be to have the render dGPU render to its local VRAM and then copy the frame directly to a shared VRAM buffer on the other dGPU for display. For APUs, you generally want to render to system memory with the APU and then copy the frame directly to the VRAM buffer on the display dGPU or vice versa (dGPU renders to VRAM and then copies the frame to a shared buffer in system memory for the APU to display). There is a copy, but the copy overhead is minor compared to the actual rendering operation.

          Thinking about it more, for the compositor, the actual rendering operation is pretty light-weight (basically just a copy) so it may make sense to do a zero copy in that case, but for graphically intense games, you'd want the extra copy.
          Last edited by agd5f; 26 October 2023, 10:32 AM.

          Comment


          • #6
            Originally posted by agd5f View Post

            I'm surprised this works on Nouveau.
            That makes two of us.

            Perhaps this is a bug in nouveau that allows this by accident, but you may end up seeing problems (flickering, blackouts, etc.) when there is PCIe bandwidth contention, limited link speeds, etc.
            Somebody on IRC pointed out it might be intentional, for Nvidia dGPUs in AMD/Intel notebooks. In which case they might have optimized the memory fetching to minimize the bad side effects. Seems plausible at least.

            Thinking about it more, for the compositor, the actual rendering operation is pretty light-weight (basically just a copy) so it may make sense to do a zero copy in that case, but for graphically intense games, you'd want the extra copy.
            Compositor drawing can be not that light as well, e.g. the GNOME overview. And as I pointed out in https://gitlab.gnome.org/GNOME/mutte...2#note_1877992 , AFAICT this can only work because the primary GPU draw buffer is linear, which means its drawing is slower than with optimal tiling. So it's not obvious that this is an overall win compared to the most optimal method with copies (mutter doesn't have an optimal implementation of that yet), at least not in all cases.

            Comment

            Working...
            X