Announcement

Collapse
No announcement yet.

NVIDIA's List Of Known Wayland Issues From SLI To VDPAU, VR & More

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by oiaohm View Post

    You need to look closer at the wait. Nvidia explicit sync is not scheduler integrated. Yes you can wait but the kernel scheduler does not have the information from the Nvidia explicit sync to know it should not wake a process up right no because the explicit sync is not in the right state for the application to continue.

    Why implicit sync is being demand in the Linux kernel graphics stack so much is that this is suiting the CPU use case not the GPU use case. What Nvidia offering does not suit the CPU use case.

    Classic mutexes don't solve the problem. Remember you can have more than one application waiting on a DMABUF so a application internal Mutex is not going to help you correctly. There need to be a graphical futex/mutex that is sync aware the implicit sync in DMABUF and KMS are horrible implementations form poll based Mutex.

    https://www.kernel.org/doc/html/late...equeue-pi.html

    Why pay if you know what you were refering to. Yes above is how pthread_cond_wait is in fact done in the Linux kernel. Do note
    pthread_cond_broadcast/pthread_cond_signal and pthread_cond_wait area pair. So every time the condition changes state the pthread_cond_broadcast or pthread_cond_signal has to be called on it so pthread_cond_wait can in fact work. So this does not work with Nvidia explicit sync.

    So how are you going to know that explicit sync value has changed to call either pthread_cond_broadcast/pthread_cond_signal to use pthread_cond_wait. Same problem happens with a normal Mutex.

    eventfd can get you close.


    POLLIN and POLLOUT that DMABUF/KMS has implemented is to eventfd specification so poll works correctly and that is where implicit sync comes from. So o dear implicit sync behavour is defined in eventfd epoll.

    This is a case the kernel side need to be able to fill in the POLLIN and POLLOUT data some how so that eventfd works right. Signal has to come from somewhere.

    There is also a reason why this is wanted to come from kernel. So that if you give a process very high priority you don't kill the signal source and for sure deadlock with graphical.

    Remember how people said implicit sync comes from everything being a file they are absolutely correctly. Eventfd is a file event thing so it has implicit sync like it or not if you implement it correctly. This is also why when Nvidia goes we have explicit sync only everyone else like "What the ...." are not not using epoll for stuff are you ignoring how its meant to be implemented it defined??
    Epoll doesn't work by cpu scheduler. It is not in fact CPU scheduler driven, but event driven. This is why complexity of epoll is O(Number of events) not O(number of things to monitor).
    • An epoll file descriptor has a private struct eventpoll that keeps track of which fd's are attached to this fd. struct eventpoll also has a wait queue that keeps track of all processes that are currently epoll_waiting on this fd. struct epoll also has a list of all file descriptors that are currently available for reading or writing.
    • When you add a file descriptor to an epoll fd using epoll_ctl, epoll adds the struct eventpoll to that fd's wait queue. It also checks if the fd is currently ready for processing and adds it to the ready list, if so.
    • When you wait on an epoll fd using epoll_wait, the kernel first checks the ready list, and returns immediately if any file descriptors are already ready. If not, it adds itself to the single wait queue inside struct eventpoll, and goes to sleep.
    • When an event occurs on a socket that is being epoll()ed, it calls the epoll callback, which adds the file descriptor to the ready list, and also wakes up any waiters that are currently waiting on that struct eventpoll
    There is no CPU scheduler action involved unless we talk about waits/sleeps done by processes. Epoll() strictly reminds explicit like sync in working, poll() is more implicit and that one iterates over all stuff in list to see if change has been done. If you would use Poll() that one is more like polling over and implicit sync, but epoll is so much faster that it is simply not used unless we talk about unix compability. (epoll does not exist in BSD, BSD has kqueue, Solaris have own quirks etc.)
    Last edited by piotrj3; 30 May 2022, 04:22 PM.

    Comment


    • Originally posted by piotrj3 View Post

      Epoll doesn't work by cpu scheduler. It is not in fact CPU scheduler driven, but event driven. This is why complexity of epoll is O(Number of events) not O(number of things to monitor).
      • An epoll file descriptor has a private struct eventpoll that keeps track of which fd's are attached to this fd. struct eventpoll also has a wait queue that keeps track of all processes that are currently epoll_waiting on this fd. struct epoll also has a list of all file descriptors that are currently available for reading or writing.
      • When you add a file descriptor to an epoll fd using epoll_ctl, epoll adds the struct eventpoll to that fd's wait queue. It also checks if the fd is currently ready for processing and adds it to the ready list, if so.
      • When you wait on an epoll fd using epoll_wait, the kernel first checks the ready list, and returns immediately if any file descriptors are already ready. If not, it adds itself to the single wait queue inside struct eventpoll, and goes to sleep.
      • When an event occurs on a socket that is being epoll()ed, it calls the epoll callback, which adds the file descriptor to the ready list, and also wakes up any waiters that are currently waiting on that struct eventpoll
      There is no CPU scheduler action involved unless we talk about waits/sleeps done by processes. Epoll() strictly reminds explicit like sync in working, poll() is more implicit and that one iterates over all stuff in list to see if change has been done. If you would use Poll() that one is more like polling over and implicit sync, but epoll is so much faster that it is simply not used unless we talk about unix compability. (epoll does not exist in BSD, BSD has kqueue, Solaris have own quirks etc.)
      DMABUF implict sync you can use epoll the reality is you have just wrote how 90% of DMABUF implicit sync is implmented. The important thing here is that the kernel is checking before it gives a time slice to a process that everything is ready to go. So that the process does not fully wake up and have a context switch to the application. If you context switch to application you have just used lot of time slice.

      Sleeps and Waits are important.



      Each EGLStream is also associated with a consumer that retrieves EGL image frames from the EGLStream. The consumer is responsible for determining when an image frame is available, and displaying it or consuming it in some other way. The consumer is also responsible for indicating the latency when that is possible. The latency is the interval from the time when an image frame is retrieved from the EGLStream to the time it is delivered to the user.
      The reality is Nvidia implementation of explicit sync is not using poll or related at all for sync. It using their own functions that are to be by the consumer as in application running in userspace. Of course this leads to a performance problem when you compare Nvidia vs DMABUF. A performance problem on the Nvidia side.

      Yes epoll is used by a lot of applications. To the point people developing on freebsd have made a shim library for it.

      Nvidia says there stiff is explicit sync as a reason why everyone else has to change then does not talk about better CPU utilisation by not wasting context switchs and cpu scheduler time slice allocations because in reality Nvidia explicit sync is not properly supported kernel side resulting in very bad CPU utilisation.

      Yes one of Nvidia historic arguments against epoll and others is that this would result in them having to implement platform particular code. Problem here is platform particular code that makes you work correctly with the platform kernel not to waste time slices of the CPU. This is the NIH problem Nvidia developers look from the GPU point of view while massively missing stuff critical on the CPU side for performance and power usage. Yes CPU side would result in more code in the Nvidia driver being platform particular and less code being generic. Both intel and amd developers attempt a generic shared driver between multi platforms before waking up it really does not work right. Nvidia is very slow on the uptake of this point.

      Comment


      • Why explicit sync is a limitation on Wayland unlike X11 in the Nvidia drivers?

        Comment


        • Originally posted by MorrisS. View Post
          Why explicit sync is a limitation on Wayland unlike X11 in the Nvidia drivers?
          I am working on a 2D graphic application with OpenGL(like QGIS). Recently when I was testing some benchmarks, there was a weird performance difference between my 2 Graphic Cards. So I made a simple...


          The limitation is there with Nvidia on X11 on bare metal as well. Under wayland Xwayland is used so forcing glamor that brings the problem to floor. Reality is the explicit sync results in poor performance when you have X11 2d applications.

          So the problem has been there for a long time. Over due to a correct solution.

          Comment


          • Originally posted by oiaohm View Post

            I am working on a 2D graphic application with OpenGL(like QGIS). Recently when I was testing some benchmarks, there was a weird performance difference between my 2 Graphic Cards. So I made a simple...


            The limitation is there with Nvidia on X11 on bare metal as well. Under wayland Xwayland is used so forcing glamor that brings the problem to floor. Reality is the explicit sync results in poor performance when you have X11 2d applications.

            So the problem has been there for a long time. Over due to a correct solution.
            Which solution should be applied to solve this general problem?

            Comment


            • Originally posted by MorrisS. View Post
              Which solution should be applied to solve this general problem?
              Lot of this problem is that X11 2d was designed around using items that are like implicit sync. Kernel supported sync so that when a process is waiting for a sync its not being giving cpu slices. Intel and AMD both have a form of implicit sync from DMABUF. Yes this is kernel support using poll system so that when waiting for sync process are not using time slices. Problem is at the moment Nvidia is still really stubborn saying explicit sync or nothing and their implicit sync is not design to work with the kernel scheduler well.

              Comment


              • Originally posted by oiaohm View Post

                Lot of this problem is that X11 2d was designed around using items that are like implicit sync. Kernel supported sync so that when a process is waiting for a sync its not being giving cpu slices. Intel and AMD both have a form of implicit sync from DMABUF. Yes this is kernel support using poll system so that when waiting for sync process are not using time slices. Problem is at the moment Nvidia is still really stubborn saying explicit sync or nothing and their implicit sync is not design to work with the kernel scheduler well.
                Ok but in any case, Nvidia allows to use X11 environment unlike Wayland despite implicit sync. The implicit sync is consequence of the opengl stack based on implicit sync. How long for vulkan integration migrating from opengl?

                Comment


                • Originally posted by MorrisS. View Post
                  Ok but in any case, Nvidia allows to use X11 environment unlike Wayland despite implicit sync. The implicit sync is consequence of the opengl stack based on implicit sync. How long for vulkan integration migrating from opengl?

                  https://themaister.net/blog/2019/08/...nchronization/
                  Sorry no. This arguement that vulkan is explicit sync that you don't need implicit sync is wrong. Vulkan has implicit sync as well. Swapping to vulkan fixes nothing the implicit sync problem remains that without kernel integration. Wait on a implicit sync does not have applications being woken up and put back to sleep over and over again requires kernel intergration. Context switchs are expensive on the CPU. Nvidia driver is incompadible with all standard graphics stack. Yes even DX12 has implicit sync functions. Ok Nvidia on windows does not have to implicit those because Microsoft does in kernel space. There is a extend driver to DX12 that basically windows kernel space to windows kernel space.

                  Basically Nvidia pulling a round peg square hole here..
                  Last edited by oiaohm; 01 June 2022, 09:09 PM.

                  Comment


                  • Originally posted by oiaohm View Post


                    https://themaister.net/blog/2019/08/...nchronization/
                    Sorry no. This arguement that vulkan is explicit sync that you don't need implicit sync is wrong. Vulkan has implicit sync as well. Swapping to vulkan fixes nothing the implicit sync problem remains that without kernel integration. Wait on a implicit sync does not have applications being woken up and put back to sleep over and over again requires kernel intergration. Context switchs are expensive on the CPU. Nvidia driver is incompadible with all standard graphics stack. Yes even DX12 has implicit sync functions. Ok Nvidia on windows does not have to implicit those because Microsoft does in kernel space. There is a extend driver to DX12 that basically windows kernel space to windows kernel space.

                    Basically Nvidia pulling a round peg square hole here..
                    Thanks for the clarification I had the misunderstanding that Vulkan was based on explicit sync.

                    Comment


                    • Originally posted by oiaohm View Post


                      https://themaister.net/blog/2019/08/...nchronization/
                      Sorry no. This arguement that vulkan is explicit sync that you don't need implicit sync is wrong. Vulkan has implicit sync as well. Swapping to vulkan fixes nothing the implicit sync problem remains that without kernel integration. Wait on a implicit sync does not have applications being woken up and put back to sleep over and over again requires kernel intergration. Context switchs are expensive on the CPU. Nvidia driver is incompadible with all standard graphics stack. Yes even DX12 has implicit sync functions. Ok Nvidia on windows does not have to implicit those because Microsoft does in kernel space. There is a extend driver to DX12 that basically windows kernel space to windows kernel space.

                      Basically Nvidia pulling a round peg square hole here..
                      Here it is stated that vulkan is based on explicit syncronization.

                      Comment

                      Working...
                      X