Announcement

Collapse
No announcement yet.

KDE KWin's Move Away From GBM Surfaces

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by Berniyh View Post
    Introducing Vulkan rendering does not mean that OpenGL will be dropped. Why does it have to be one or the other?
    afaik, Vulkan support is planned with Plasma 6, but I doubt it's high priority. OpenGL works, after all.
    What about whole explicit synchronization approach that compositors need to switch to with Vulkan? Will KWin tackle that? Classic Wayland compositors were using implicit synchronization and it wasn't easy to change the paradigm.

    There was a post about it here: https://www.collabora.com/news-and-b...-gap-on-linux/
    Last edited by shmerl; 14 March 2023, 11:04 PM.

    Comment


    • #52
      KDE never gonna become stable and thus never become dominant and truly usable. And I absolutely can't stand the ergonomics (or rather lack thereof) of Gnome Shell and its apps.

      Free desktop is a mess.

      Comment


      • #53
        Originally posted by piotrj3 View Post
        Quoting AMD engineer:
        The synchronization works because the Mesa driver waits for idle (drains the GFX pipeline) at the end of command buffers and there is only graphics queue, so everything is ordered.​

        And that wait for idle sucks. Implicit wasn't that bad in old APIs and older X server. Moment however we started to do multithreading, multiple command queues etc... started sucking.
        Legacy support is not the most efficiency thing. What AMD engineer says they do so mesa works you find Microsoft doing so directX 11 and before implicit sync works in the Microsoft kernel mode direct X overlay driver.

        The X11 2d rendering that glamor. is a implementation of is most used by really old applications from the time that multi-threading the UI was a pipe dream and where implicit sync is normal.

        Strongly inefficient is not grounds to not support a feature. Spinlocks in userspace is strongly inefficient as well. Yes the Linux kernel developers at one point looked at banning user-space spinlocks with the same kind of arguments you are making. Linux kernel developers had to back track on this plan as well. Yes the Linux kernel developers did consider making spinlocks perform badly as well.

        Comment


        • #54
          Suddenly it looks like Kwin is making more progress than the odd fork named "Kwin Fast Track"...

          Comment


          • #55
            Originally posted by oiaohm View Post

            Do not mix up EGL with EGLStreams. EGLStreams has something as equally stupid as GBM surface. There was a change in mesa submitted to standard body to allow EGL to directly create GBM buffers without using GBM surface middle structure.

            Weasel do remember Nvidia funded a developer to make KDE work with EGLStreams and was forced to admit defect this is why Nvidia drivers worked out how to get GBM support. I would say in this case both Nvidia and Mesa were wrong with their first solution. Mesa solution was closer to right.

            Nvidia made a mistake with no implied sync. Yes Microsoft asked for no implied sync in Windows graphics drivers because they did their own kernel space implied sync for graphics on top of the explicated sync interfaces Microsoft asked for. Also make mistake with EGLStreams memory model that if you closed EGLStreams application all buffers that application allocated even if they had been shared with another application would cease to exist. So no compositor restart with EGLStreams is possible.

            Then another mistake no turtles all the way down by design. So eglstream buffer shared from wayland compostor that was then shared to xwayland by eglstreams design would not want to share to X11 application.

            Nvidia pushed eglstreams hard but it was a total not functional for Wayland compositor or X11 server.
            I thought that explicit sync provided better performance and was better and more modern (I have no idea of this stuff though, so I don't really know). Am I wrong?

            Comment


            • #56
              Originally posted by jorgepl View Post

              I thought that explicit sync provided better performance and was better and more modern (I have no idea of this stuff though, so I don't really know). Am I wrong?
              It seems it's indeed worse, less scalable and less performant. How is this related to Vulkan? Would a switch to Vulkan mean to abandono these two and run the compositor on an explicit sync fashion?

              Comment


              • #57
                Originally posted by jorgepl View Post

                I thought that explicit sync provided better performance and was better and more modern (I have no idea of this stuff though, so I don't really know). Am I wrong?
                No you are right, explicit sync will always provide the best possible performance. This is because implicit sync has to deal with concurrency "implicitly" (hence the name), simplifying a bit this means its done by heuristics, processing (which adds additional overhead), reliance on manual ordering of function calls or other techniques.

                With explicit sync, the user has to provide the exact concurrency primitives (i.e locks, semaphores, mutex's, etc etc) which gives precise control over how to synchronise resources.

                Newer API's like Vulkan or DX12 are built from the ground up with explicit sync in mind (there are minor exceptions but this is largely due to interopting into other implicit sync systems).



                Originally posted by oiaohm View Post

                Except that was not true for the complete time of Vulkan existence that you could do everything opengl could do under Vulkan. Issue with Zink needing vulkan extensions was because Vulkan and Opengl at the start did not 100 percent overlap with each other.
                Yeah you could do it, but badly (in terms of performance) and most importantly inconsistency between different OpenGL vendors which is honestly one of the biggest issues with OpenGL. The design and API fundamentals are so old/outdated, plus it being high level means that its practically speaking its not possible to provide both a fully performant and correct/consistent experience across different OpenGL vendors.

                Its so bad that both NVidia and AMD have to hot patch game executables on the fly at the driver level to squeeze extra performance, with Vulkan/DX12 no such thing is necessary.



                Originally posted by oiaohm View Post

                Strongly inefficient is not grounds to not support a feature. Spinlocks in userspace is strongly inefficient as well. Yes the Linux kernel developers at one point looked at banning user-space spinlocks with the same kind of arguments you are making. Linux kernel developers had to back track on this plan as well. Yes the Linux kernel developers did consider making spinlocks perform badly as well.
                This is barely related, I don't know why you are bringing it up. Linux wanted to deliberately make spinlocks perform badly because of deadlocking issues which is an unrelated problem, this isn't whats happening here with implicit vs explicit sync for the graphics stack

                Last edited by mdedetrich; 15 March 2023, 04:17 AM.

                Comment


                • #58
                  Originally posted by shmerl View Post

                  What about whole explicit synchronization approach that compositors need to switch to with Vulkan? Will KWin tackle that? Classic Wayland compositors were using implicit synchronization and it wasn't easy to change the paradigm.
                  It's not yet clear what exactly they will do, or at least I couldn't find the info on that. Other than the basic plan to provide Vulkan support at some point in the lifetime of kwin 6.
                  There is a Vulkan branch from 2017, but it has been abandoned and we don't know if they would reuse that or reimplement it (that's 5.5 years after all since then).

                  I think the only safe assumption is that the OpenGL backend won't disappear. Such a big change will surely be scheduled for Plasma 7 at the earliest, everything else would be a big surprise in my opinion.
                  If anything, they might drop support for XRender with Plasma 6, but I'd doubt even that.
                  Actually, this shows that they are willing to implement and maintain multiple backends for the rendering instead of removing the less used ones.
                  How they will handle the implicit/explicit topic for these multiple backends we'll only see when they start on it.

                  Comment


                  • #59
                    Originally posted by jorgepl View Post
                    I thought that explicit sync provided better performance and was better and more modern (I have no idea of this stuff though, so I don't really know). Am I wrong?
                    Problem is you are not wrong but you missed a key sides.

                    First problem "More Modern". Glide, early opengl and early direct (yes sections of directx 11 and before right back to directx 1) are implicit sync. So these things are not modern. Part of the more modern with explicit sync

                    Yes Nvidia paper slide 2. Notice anything here that is problem. Most likely no. Now look at slide 8 to 16 yes where you have implicit sync implemented first then explicit sync implemented. There is a problem the code is different.

                    Explicit sync applications have to be altered to suite. Yes explicit sync is faster but faster is not always better. Application designed for implicit sync would expect to be stalled because of implicit sync. So you brute force convert implicit sync application to explicit sync you can magically make it not thread safe any more because you have removed the implicit sync lock.

                    Yes removing the implicit sync lock is why explicit sync is faster. This is double sided sword explicit sync can make application faster it can also make applications totally not function.

                    Now this also gets worse. Pure userspace explicit sync has a problem.


                    Notice here you can push fences from userspace to kernel space. The Nvidia explain of implicit sync skip over something very important. When something is stalled out in implicit sync by kernel the cpu slice can be given to another process.

                    Now how in the Nvidia diagram there do you "kernel enforces a wait" ​ while in usermode explicit sync. The answer is you don't. With sync_file feature in the Linux kernel you can push your explicit sync fences to kernel and then perform implicit sync operation and intentionally go into a implicit sync kernel side wait.

                    Remember while in a kernel enforced wait the kernel scheduler can be giving the time slices that would go to that thread under explicit sync to other threads and processes on the system.

                    Explicit sync is not the be all and solve all. Yes explicit sync is better than a pure implicit sync system. Hybrid between explicit sync and implicit sync can be better than each alone.

                    Yes channel 3 in the Explicit sync example in that documentation is the problem child that the one you want to function as implicit sync. So 2 you wanted to function as explicit sync and one you wanted to function as implicit sync.

                    High effective system waiting is a job you want the kernel doing so the kernel can allocate the CPU time when waiting to other processes so CPU don't sit around doing nothing. Also high effective system you want to reduce the amount of waiting.

                    So High effective systems you have two objects you need to meet at the same time. Explicit sync reduced the amount of waiting. Implicit sync has the waiting in the kernel. Both are half the solution.

                    You could say the argument between explicit sync and implicit sync is basically rehash of the user space spinlock(explicit sync) vs kernel side mutex(implicit sync) for what one was more effective. Remember the correct answer for spinlock vs mutex is futex that is bit of both. Correct answer to explicit sync vs implicit sync is also most likely a bit of both for the same reasons. Yes kernel mutexs and implicit sync kernel both get waiting performed by kernel. spinlocks and explicit sync both reduce amount of waiting.

                    Explicit sync is very much a fancy spinlock with all the same problems as a spinlock. Implicit sync is basically a fancy kernel mutex with all the problems of a kernel mutex. Yes this case none has proposed lets have a futex equal instead of both explicit sync and implicit sync so leading to sync_file work around.

                    Remember the AMD developer still has with implicit sync waiting in kernel and giving up CPU slices to the scheduler to be allocated to other tasks. Stuck waiting be it spinlock or explicit sync is wasteful of CPU time.

                    Comment


                    • #60
                      Originally posted by mdedetrich View Post
                      No you are right, explicit sync will always provide the best possible performance. This is because implicit sync has to deal with concurrency "implicitly" (hence the name), simplifying a bit this means its done by heuristics, processing (which adds additional overhead), reliance on manual ordering of function calls or other techniques.

                      With explicit sync, the user has to provide the exact concurrency primitives (i.e locks, semaphores, mutex's, etc etc) which gives precise control over how to synchronise resources.

                      Newer API's like Vulkan or DX12 are built from the ground up with explicit sync in mind (there are minor exceptions but this is largely due to interopting into other implicit sync systems).

                      There is a problem here. Lets say you intentionally code a perect 1 to 1. I mean perfect 1 to 1 something people doing explicit sync examples don't do. A perfect 1 to 1 means that the implicit sync and explicit sync will have exactly the same waits because they are stopping because of exactly the same conditions. This is when something goes horrible wrong with explicit sync. CPU usage is higher than implicit sync what as happened. Implicit sync has used kernel side implicit sync so the scheduler has been able to either not allocate the time slices or allocate the time slices to other processes while waiting.

                      Notice the little redboxes on page 10 here. Those are the implicit sync cpu time usage before going into a wait where the cpu slices could be reallocated.

                      There is no redbox in the explicit sync example there there is no kernel usage there is no kernel wait. No kernel wait no reallocation of timeslices so excessive cpu usage.

                      Originally posted by mdedetrich View Post
                      Yeah you could do it, but badly (in terms of performance) and most importantly inconsistency between different OpenGL vendors which is honestly one of the biggest issues with OpenGL. The design and API fundamentals are so old/outdated, plus it being high level means that its practically speaking its not possible to provide both a fully performant and correct/consistent experience across different OpenGL vendors.

                      Its so bad that both NVidia and AMD have to hot patch game executables on the fly at the driver level to squeeze extra performance, with Vulkan/DX12 no such thing is necessary.
                      Vulkan and DX12 to squeeze extra performance out is going to take harder hot patching. You have a spinlock problem that need fixing that explicit sync causes. No games usage of any graphics API is going to be perfect. Opengl we have had a lot longer to learn where the design faults are.

                      mdedetrich big issue here is no one check the worse case of explicit sync. Those pushing the idea explicit sync even in that Nvidia paper had close to best case for explicit sync vs the worst case for implicit sync.. The worse case of explicit sync is that the fencing of implicit sync was 100 percent perfect so you had to implement exactly the same fencing in explicit. Yes if you test worst case explicit sync vs implicit sync what you find explicit sync is a spinlock with spinlock behavour resulting in higher CPU usage this is not what you should find we know better since we have futex and other equal items to get the OS kernel to perform waiting.

                      Yes being GPU developers with explicit sync was not looking at the CPU usage close enough.

                      Comment

                      Working...
                      X