Announcement

Collapse
No announcement yet.

Linux 5.19 Advances In Quest To Improve Explicit Synchronization For Graphics

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linux 5.19 Advances In Quest To Improve Explicit Synchronization For Graphics

    Phoronix: Linux 5.19 Advances In Quest To Improve Explicit Synchronization For Graphics

    Adding to the list of many exciting features in Linux 5.19 is a new DMA-BUF fence import/export API for improving the usage of explicit synchronization on the Linux desktop to help with Vulkan and allowing more of the Linux desktop to move in the future to a more explicit synchronization model...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    The write up misses why stuck with the hacks for ever. Android compositor might be explicit sync but there are hacks to support opengl applications because lots of those require implicit sync. Yes Android has buffers with implicit sync enabled for opengl application support. Mixing up if a buffer is implicit sync or is explicit sync leads to some serous problems.

    Mesa would be stuck with hacks as long as we need anything opengl with explicit sync only route because the opengl standard mandates implicit sync in places. Yes this is opengl the desktop version or opengl es the embedded/phone version.

    This path is way better as you don't really need worry if a buffer is implicit or explicit sync since a buffer can be used either way. .

    DMABUF implicit sync is based on using dma fences that are explicit sync. So transferring explicit sync state from one application to another from the android model is done by special sync files and then the buffer it self is done DMABUF file handle this is in fact moving around two file handles that you have to keep in alignment with each other. Not a tidy solution. Also for opengl support Android had to be tracking what buffers have implicit sync enabled and special handle them.

    The in the android sync files(yes this feature was added to generic dmabuf) for graphics is information of dma fences this is the same information that would be stored in the implicit sync structure of the DMABUF so implicit sync calls can work.

    So why send two file handles when you can put everything under one file handle and just send that. By having all the information accessible fromthe one file handle metadata you now have a file handle that covers both implicit ane explicit sync. This change makes absolute sense.

    There is really not a need to split the explicit sync and implicit sync functionality absolutely. Particualar when implicit sync under dmabuf is in fact implemented on explicit sync all that was missing was a import/export from the dmabuf implicit sync structures that is solved in 5.19.

    The reality here both implicit sync and explicit sync done this way does not result in any extra over synchronisation. Because if the compositor is using explicit sync when get gets the dmabuf it extracts sync data so it can perform explicit sync and would not be performing implicit sync poll. The dma-fence are the same either way.

    The extra sync comes about when you have a double set of dma-fence stuff. As in one set of dma-fence for explicit sync usage and one set of dma-fence stuff for implicit sync usage. This model where you use the implicit sync dma-fence storage for transferring explicit sync dma-fence lists between applications means you only have a single set of fences. No double ups no over synchronisation.

    One of the problem normally of implementing implicit sync on top of explicit sync is doubling you explicit sync fences this design solves the problem. This developer asked the correct of him self do we really need two lists of dma-fences transferred between applications as in one set for implicit sync and one set for explicit sync for implicit sync and explicit sync to work the answer is no we don't. If we do transfer two sets of dma-fences for implicit and explicit sync that is the causes of the problem leading to more fences that what should be being assigned and makes handling of those buffers more complex as those lists of fences can be out of alignment with each other equally way more processing. Really way more processing because you have case that fences could be listed in both the explicit sync list and the implicit sync list or only in one list so lots of sorting and wasteful processing.

    All the examples saying implicit sync mixed with explicit sync cause over synchronisation have common flaw independent set of fences for implicit and explicit sync. Yes the Linux kernel dmabuf did have this flaw as well before this change as there was no clean way to align the implicit sync dma-fences with the explicit sync dma-fences.

    Really simple to miss that Linux kernel dmabuf explicit and implicit sync is based on the same explicit sync part being dma-fences. Also really simple to miss that the problem is not really about explicit sync vs implicit sync but how to implement these while not duplicating dma-fences or equal and not making any more lists of stuff then you really need to. This change moves to a single list of fences for implicit sync and explicit sync.

    Exactly what reason did we ever need two list of fences so we could have implicit sync and explicit sync? I don't think there is a valid reason for the old design choice.

    Comment


    • #3
      Could you please add some detail to the claim that OpenGL has an implicit sync model? Certainly when using a buffer in different threads, the application developer is responsible for explicit synchronization, and this has been true since OpenGL 1.0 when the only reliable means of synchronization was glFinish().

      > Also for opengl support Android had to be tracking what buffers have implicit sync enabled and special handle them.

      Having worked on both GL and Android, I have absolutely no idea what you're talking about. Android's solution to the "untidy" requirement of pairing a buffer with a sync is to include them both in queueBuffer() and dequeueBuffer() APIs, which are used by eglSwapBuffers to communicate with SurfaceFlinger.

      Comment


      • #4
        This is a great write-up, oiaohm, except I've no effing clue what implicit or explicit syncs are

        Would be great if someone explained the whole situation (graphics pipeline) in laymen's terms.

        Comment


        • #5
          Originally posted by birdie View Post
          This is a great write-up, oiaohm, except I've no effing clue what implicit or explicit syncs are

          Would be great if someone explained the whole situation (graphics pipeline) in laymen's terms.
          This is possibly the best writeup on this topic there is: https://lwn.net/Articles/814587/

          Comment


          • #6
            Originally posted by birdie View Post
            This is a great write-up, oiaohm, except I've no effing clue what implicit or explicit syncs are

            Would be great if someone explained the whole situation (graphics pipeline) in laymen's terms.
            It is a bit like caller vs calle situation. In implicit sync synchronization is indirect and implied and is done as responsibility of OpenGL driver or DirectX kernel driver. Most of time you don't hold the responsibility to synchronize stuff.

            Explicit sync is explicit. client needs to put fences, semaphors and decide what to do. In explicit sync for driver situation is much easier - it is simply synchronizing only when it is told to. Example is Vulkan.

            Generally there are 2 big issues :

            - first most of open source driver stack is implicit while Nvidia is explicit. It kind of makes some issues that are quite hard to solve without allowing some parts to become explicit.

            - second Vulkan happened and now we have explicit sync clients that doesn't work well in implicit MESA driver. We rely on hacks, oversynchronization issues.

            Generally implicit sync is not an issue in case of X or opengl client aplications as they are single threaded, single command queue systems so things are ordered right. This is why client aplications opengl can work in explicit only nvidia driver.

            Problem is since then we got multithreaded systems and gpus with many command queues. Vulkan solves that issue because client in that case knows when to synchronize stuff, and viola.

            In case of Linux the issue is bigger because KMS, GBM are implicit sync only. EGLstreams (explicit) solves some of those issues (but also introduces own).

            Comment


            • #7
              Originally posted by birdie View Post
              This is a great write-up, oiaohm, except I've no effing clue what implicit or explicit syncs are

              Would be great if someone explained the whole situation (graphics pipeline) in laymen's terms.


              Turns out defines of explicit sync and implicit sync are kind of as clear as mud. Depend on the standard you are reading what implicit and explicit sync in fact includes and what requirements are on implementation. Implicit and explicit sync can alter how they have to be implemented. glfinish in opengl that explicit sync has you need dmabuf implicit sync to implement it. By technical theory you should not be able to implement explicit sync on top of implicit sync but real world due to standard defines this theory does not stand up..

              glfinish has a requirement that application stops getting cpu time until glfinish is in synced state this now has you requiring kernel implementation. So this might be called explicit sync in opengl standard but in reality it closer to a implicit sync.

              Nvidia defines explicit sync as in something the application can do without context switching away. And implicit sync as something done by the library also in userspace also without context switching away and this happens not to line up with the opengl standard define for glfinish. Yes opengl standard defines glfinish as explicit sync but by the Nvidia define glfinish by opengl standard neither implicit or explicit because it has to be done kernel side so the application stops getting cpu slices while waiting for the sync and Nvidia defines of implicit and explicit sync don't cover that horrible event.

              Linux kernel explicit sync is something application done without need kernel involvement without confliting switching away this could be a library or the application performing the sync check so by Linux kernel defines Nvidia has no implicit sync support because even what Nvidia calls implicit sync by Linux kernel defines is only explicit sync. implicit sync by Linux kernel is that you have called some function(syscall/ioctl) so take control to kernel to perform the sync so stop the userspace application from having cpu cycles while sync/lock is not in the correct state to be usable by application.

              Great fun right depending on who define of implicit/explicit sync you are doing depends if something implicit sync or explicit sync. Explicit sync generally is that the sync can basically be spinlocked in userspace code until it resolved. Implicit sync could be a library based spinlock on the sync or it could be kernel performed .sync check this altered define to define.

              The reality is we really want some form of kernel being able to wait on a sync so that application is not getting time slices when it not able to do anything this is the reason why futex works so well. Remember futex if the lock is not contended is basically explicit sync by Linux kernel defines. Yes futex is implicit sync when you have to wait due to the lock being contended by Linux kernel defines. So its possible for an item to be both implicit and explicit sync.

              Of course explicit sync could also have a kernel side wait feature like what is found in a few standards like opengl.

              Birdie I really wish this was not a clear as mud in defines. This makes debates about this harder.

              The reality that is possible to implement items that are both implicit and explicit sync does alter the debate lot once you consider that those can exist and how effective they can be.

              Comment


              • #8
                Originally posted by piotrj3 View Post
                It is a bit like caller vs calle situation. In implicit sync synchronization is indirect and implied and is done as responsibility of OpenGL driver or DirectX kernel driver. Most of time you don't hold the responsibility to synchronize stuff.
                Nvidia also defines implicit sync as sync performed by library that does not return to core application. This does cause some serous problems because CPU time slices eat up being eaten up by userspace spinlock. This is absolutely not good for performance.

                Originally posted by piotrj3 View Post
                - first most of open source driver stack is implicit while Nvidia is explicit. It kind of makes some issues that are quite hard to solve without allowing some parts to become explicit.
                This is not 100 percent true. dma-fence that all open source graphical implicit sync is based on is explicit sync. Majority the core of open source kernel mode drivers for graphics is in fact explicit sync.

                Originally posted by piotrj3 View Post
                - second Vulkan happened and now we have explicit sync clients that doesn't work well in implicit MESA driver. We rely on hacks, oversynchronization issues.
                This 5.19 alteration fixs most of issue. The open source graphics drivers themselves are not implicit at core for majority of hardware. The 5.19 alteration basically allows access to the explicit sync data ie the dma-fences that the Linux kernel dmabuf implicit sync is using to function.

                Originally posted by piotrj3 View Post
                Generally implicit sync is not an issue in case of X or opengl client aplications as they are single threaded, single command queue systems so things are ordered right. This is why client aplications opengl can work in explicit only nvidia driver.
                No this is why Nvidia needs a lot of work arounds for different opengl applications that don't need with mesa stack. Because not all opengl applications are single threaded. The means to take the CPU cycles off application is kind of important not to have artefacts from multi threading. Not having kernel support leaded to at times poor performance due to wasted cpu cycles.

                Originally posted by piotrj3 View Post
                Problem is since then we got multithreaded systems and gpus with many command queues. Vulkan solves that issue because client in that case knows when to synchronize stuff, and viola.
                Vulkan has its problems.
                Setup Vulkan is supposed to ease the load on CPU and work more on the GPU to gain more FPS. But on Doom 2016 I'm getting low GPU usage and really hi...


                This is you ignoring higher CPU usage that you hit with Vulkan. Not having a kernel level sync is bad.


                Originally posted by piotrj3 View Post
                In case of Linux the issue is bigger because KMS, GBM are implicit sync only. EGLstreams (explicit) solves some of those issues (but also introduces own).
                GBM is neither implicit sync or explicit sync. Sync operations are outside the domain of GBM the closest you get is GBM telling you that a particular buffer on hardware due to hardware design is implicit sync only. Other than that GBM has nothing todo with sync. DMABUF side is where you find all the sync stuff for buffers that GBM is tracking in the application. Remember that buffer information that GBM is tracking only got to the application by a DMABUF operation. This is why I commonly write GBM/DMABUF because alone GBM is basically useless.

                KMS that is truly mix of implicit and explicit sync.
                https://www.kernel.org/doc/html/v4.1...ing-properties and it been that way for quite some time.

                KMS from android development had a lot of explicit sync support added. Yes the ways to get to KMS explict sync are mostly by https://01.org/linuxgraphics/gfx-doc...sync_file.html sync file. Yes feature from android development.

                KMS problem is very much like the DMABUF problem where you have KMS implicit sync functions and no way to align what those functions are using back to explicit sync. This turns out to be another list of dma-fence structures. So same problem as what dmabuf had of having two list of dmafence structures when there really should only be one.

                The problem I would say here is the Linux kernel explicit sync has not be correctly integrated. The 5.19 patch basically correct integrates explicit sync with dmabuf. This still leaves KMS., dmabuf being done basically makes GBM done.

                Comment


                • #9
                  Originally posted by oiaohm View Post

                  Nvidia also defines implicit sync as sync performed by library that does not return to core application. This does cause some serous problems because CPU time slices eat up being eaten up by userspace spinlock. This is absolutely not good for performance.



                  This is not 100 percent true. dma-fence that all open source graphical implicit sync is based on is explicit sync. Majority the core of open source kernel mode drivers for graphics is in fact explicit sync.


                  This 5.19 alteration fixs most of issue. The open source graphics drivers themselves are not implicit at core for majority of hardware. The 5.19 alteration basically allows access to the explicit sync data ie the dma-fences that the Linux kernel dmabuf implicit sync is using to function.



                  No this is why Nvidia needs a lot of work arounds for different opengl applications that don't need with mesa stack. Because not all opengl applications are single threaded. The means to take the CPU cycles off application is kind of important not to have artefacts from multi threading. Not having kernel support leaded to at times poor performance due to wasted cpu cycles.



                  Vulkan has its problems.
                  Setup Vulkan is supposed to ease the load on CPU and work more on the GPU to gain more FPS. But on Doom 2016 I'm getting low GPU usage and really hi...


                  This is you ignoring higher CPU usage that you hit with Vulkan. Not having a kernel level sync is bad.




                  GBM is neither implicit sync or explicit sync. Sync operations are outside the domain of GBM the closest you get is GBM telling you that a particular buffer on hardware due to hardware design is implicit sync only. Other than that GBM has nothing todo with sync. DMABUF side is where you find all the sync stuff for buffers that GBM is tracking in the application. Remember that buffer information that GBM is tracking only got to the application by a DMABUF operation. This is why I commonly write GBM/DMABUF because alone GBM is basically useless.

                  KMS that is truly mix of implicit and explicit sync.
                  https://www.kernel.org/doc/html/v4.1...ing-properties and it been that way for quite some time.

                  KMS from android development had a lot of explicit sync support added. Yes the ways to get to KMS explict sync are mostly by https://01.org/linuxgraphics/gfx-doc...sync_file.html sync file. Yes feature from android development.

                  KMS problem is very much like the DMABUF problem where you have KMS implicit sync functions and no way to align what those functions are using back to explicit sync. This turns out to be another list of dma-fence structures. So same problem as what dmabuf had of having two list of dmafence structures when there really should only be one.

                  The problem I would say here is the Linux kernel explicit sync has not be correctly integrated. The 5.19 patch basically correct integrates explicit sync with dmabuf. This still leaves KMS., dmabuf being done basically makes GBM done.
                  In case of doom 2016 almost every single online benchmark gives small edge to Vulkan on nvidia hardware, and huge edge to Vulkan on AMD hardware. Also it is not true Vulkan does introduce higher CPU usage, in fact Vulkan allows you to draw more thanks to smaller CPU overhead. Most people (if you search forums) much more prefer Vulkan over OpenGL in doom 2016.



                  Just started playing this game last night and I'm just wondering what the better graphical option is. Vulkan or opengl 4.5? Is there a graphical difference between the two or are they basically the same?


                  vulkan allows you to implement wait times on side of GPU without making any CPU wait times.

                  Comment


                  • #10
                    Glad that Phoronix made an article on this and that there is progress on this front, one of the major pain points of the Linux desktop/graphics stack, and many thanks to Jason Ekstrand for pushing this through.
                    Last edited by mdedetrich; 10 June 2022, 07:24 AM.

                    Comment

                    Working...
                    X