Announcement

Collapse
No announcement yet.

NVIDIA's List Of Known Wayland Issues From SLI To VDPAU, VR & More

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by zexelon View Post

    I was more referring to absolute performance... currently nothing touches the 3090. Now when referring to $$$ performance yeeeaaahhh in no way am I going to try to defend Nvidia. Also yes, I have a requirement for the absolute performance of the 3090 (and its not gaming) so its a price to be paid. That said, i have been an AMD user when they were the dominant GPU (back in the ATI days) and I look forward to a day when they can get back into the top teir ring, also if they can finish what they started with ROCm they will have a strong chance at returning to the top.
    ... So the 6900 XT and 6950 XT is a non-existant mirage then? https://www.youtube.com/watch?v=KugxMJFhx7k

    Granted, 3090 is the better pro card with 24 GB of video RAM, but the 6900 XT is no slouch and you could often afford two 6900 XT to one 3090, atleast at the start of this year. Today, it's more like 3 6900 XT for two 3090. If your application requires large amounts of video ram, then the 3090 is the better product. But then again, the 6700 XT will beat everything up to the 3080 ti in those same applications, too.

    Now, Nvidia has been alone at the top for a few generations, but it looks like RDNA2 made a decent attempt and RDNA3 is bound to shake things up further.
    Last edited by wertigon; 24 May 2022, 03:57 AM.

    Comment


    • #52
      Originally posted by oiaohm View Post

      Lot of this is Nvidia NIH problem the stuff not invented at Nvidia has not been taken into account how this limits what can be implemented so Nvidia proposes over and over again stuff that simple just cannot be done in the Linux kernel space. Also what Nvidia claims to be the problem in a lot of cases is not. Like the claim dma-buf does not have explicit sync when you look closely it does have explicit sync behind a horrible interface to user-space that could be improved and the improvement can be done without breaking the existing users.
      So evidently you didn't read NVidia's post about why its so difficult for them to do implicit sync, let me link it again

      https://gitlab.freedesktop.org/xorg/...7#note_1271324, specifically with my bold emphasis

      Thanks Erik. I'd started typing this before your reply came through, and I went into a little more detail so I'll just post it as-is:

      Yes, it certainly sounds like a sync issue. Our driver has no way to implement implicit sync, so it doesn't. For the most part, our kernel driver is blissfully unaware of what work is in flight and which buffers that work uses. It doesn't care beyond ensuring clients don't interfere with buffers they haven't allocated themselves or been granted access to, which the HW itself takes care of for the most part. We've been evaluating various ways to work around this issue for Xwayland specifically without tanking perf, but none of them have panned out so far. They either break X protocol guarantees, or don't work. Regardless, these mechanisms would only work for GL/Vulkan-based applications. Native X rendering in glamor itself still wouldn't sync properly unless using the EGLStream backend, where EGLStream handles synchronization internally from my understanding.

      The problem is currently avoided for native Wayland clients that rely on implicit sync in a way that similarly breaks Wayland protocol guarantees (basically, buffer attach commits are deferred beyond SwapBuffers/). This will of course need to be resolved eventually as well.

      Solutions considered:
      1. Explicit sync everywhere. Of course, it would help if our driver supported sync FD first. Working on that one. Then, X devs would need to relent and let the present extension support sync FD or similar. I'm not clear why there has been so much pushback there. Present was always designed to support explicit sync, it just unfortunately predated sync FD by a few months. glamor would also need to use explicit sync for internal rendering. I believe it has some code for this, but it uses shmfence IIRC, which in turn relies on implicit sync.
      2. Ensure all work is finished before submitting frames from GL/Vulkan/etc. to Xwayland or wayland. Without hacky/protocol-breaking changes (Or the shmfence thing Erik mentions, though it's specific to Xwayland) to defer sending the updates, this means doing a hard CPU stall until the GPU has idled, which is what I mean by tanking perf. We've measured ~30% perf drops for one game using this solution, but impact could vary from 0-50% depending on the workload. Also, this solution alone doesn't fix glamor rendering in X, nor any composition rendering the Wayland compositor does.
      3. Implement implicit sync in the NV kernel driver. This would also have unacceptable perf impact, though we haven't measured it explicitly in a long time. Regardless, it's essentially at odds with our software architecture, and I don't view it as a forward-looking solution.

      From the wording above, you can probably tell (1) is my preferred solution, but it's not arriving in the near term. I don't view (2) or (3) as viable or complete solutions.
      NVidia's driver has no concept of implicit sync, there is actually no realistic/practically feasible for them to implement in without completely killing performance. This is because specifically when dealing with graphics hardware, explicit sync IS the right choice. While you are correct that in some contexts implicit sync can work and because of its simpler API it can be the overall wiser decision, for over the past decade GPU's have moved away from the simple "input and output framebuffers" concept.

      This is the hilarious thing that about Linux people demanding NVidia implement implicit sync in their driver (which is the main underlying reason why there are integration issues with the Linux stack for NVidia's drivers on Wayland), they already said multiple times that this is not an option. Their entire driver is not built on implicit sync, because if you actually care about performance (which NVidia does more than placating Linux devs) its simply put the wrong model to use. On the other hand the Linux graphics stack + kernel slowly migrating away from implicit sync in the problematic areas is possible (also without breaking user space) with focus planning over time its just that it was always more convenient to miss the forest from the trees and "blame nvidia" rather than being mature and solve the core problem.

      The fact that the new protocol meant to solve all of the problems (Wayland) was released without even proper support for explicit sync is indicative of this attitude.

      The conclusion is clear, move forward and start fixing the parts of the Linux graphics stack that need to be explicit sync. Intel and especially AMD will also be happy since they already complained performance issues of implicit sync multiple times.

      Comment


      • #53
        Originally posted by mdedetrich View Post

        So evidently you didn't read NVidia's post about why its so difficult for them to do implicit sync, let me link it again
        Your point is completely proving hos point. Nvidia is not cooperating and taking the "My way or the highway" approach.

        Though their wording is different, they say "we are not doing it that way, change to our way instead." Same end result though.

        Comment


        • #54
          Originally posted by mdedetrich View Post
          So evidently you didn't read NVidia's post about why its so difficult for them to do implicit sync, let me link it again

          https://gitlab.freedesktop.org/xorg/...7#note_1271324, specifically with my bold emphasis
          .
          No you miss a key point go read the Opengl and EGL standards. Guess what implicit sync is required to be Opengl and EGL standard conforming. Glamor does nothing that the opengl standard does not say is permitted.

          Our driver has no way to implement implicit sync


          This is really a direct admit that Nvidia driver implementation is not standard conforming. Intel and AMD have dma-fence that is explicit sync but they have also implemented implicit sync on top to keep there implementation conforming.

          Something remember glamor is a implement as standard conforming opengl/egl application. The features that a breaking glamor happen to be in the official opengl test suite that you are meant to pass to claim opengl compatibility.

          Originally posted by mdedetrich View Post
          The conclusion is clear, move forward and start fixing the parts of the Linux graphics stack that need to be explicit sync. Intel and especially AMD will also be happy since they already complained performance issues of implicit sync multiple times..
          Yes AMD and Intel developers are wanting to move to more explicit sync but amd and intel have proposed leaving a system for implicit sync when it has to be used. This is one AMD alteration mentions coexisting with implicit sync. This is different to the Nvidia solution were we break standard conformance because we can.

          Originally posted by mdedetrich View Post
          This is the hilarious thing that about Linux people demanding NVidia implement implicit sync in their driver (which is the main underlying reason why there are integration issues with the Linux stack for NVidia's drivers on Wayland), they already said multiple times that this is not an option.
          The reason is in fact standard conformance so old applications do in fact work right. Remember I said Nvidia does not pass the opengl test suite properly on any platform.

          This Nvidia our way or the highway is the cause for a lot games to need Nvidia particular code to work on Nvidia GPUs.

          The reality of the Nvidia mess is we may need Zink on Nvidia cards in future to have implicit sync so legacy opengl/egl applications work right.

          Originally posted by mdedetrich View Post
          The conclusion is clear, move forward and start fixing the parts of the Linux graphics stack that need to be explicit sync. Intel and especially AMD will also be happy since they already complained performance issues of implicit sync multiple times.
          This conclusion effectively ignored the opengl and egl standards and legacy applications usage that required implicit sync. The reality is even if you covert as much a possible to explicit sync you still need implicit sync to be standard conforming with Opengl and EGL for applications and you need to for legacy Linux applications. Remember when nvidia proposed eglstreams for wayland their idea was not to support any X11 applications under Wayland so avoiding the legacy application problem and that idea was shot down very early in wayland development..

          Linux 5.19 To "Make Life Miserable" In Slowing Down Bad Behaving Split-Lock Apps Take this recent patch that has been talked about it was first proposed kill applications using Split-Locks that would break the Linux kernel ABI userspace promise so that cannot be done even that Intel developer would love to. So next option is make using that perform badly still be compatible but make the other paths the better performing route this keep the ABI userspace promise as legacy applications will still run just not perform well.

          Now you say implicit sync performs badly that is not in fact a problem and would be in fact fine. Yes having horrible overhead because you choose to use implicit sync would also be fine to meet the requirement. If Nvidia support implicit sync and it perform is horrible as that would allow legacy applications still to run and not break the ABI userspace promise of the Linux kernel and be opengl and egl standard conforming as neither standard says implicit sync has to perform well just that it has to work..

          mdedetrich maintaining ABI compatibility and being correctly standard conforming at times comes with the horrible problem of poor performance this is just the nature of the beast. Yes to pass the opengl conformance test suite the implicit sync support does not need to be the fast code path.

          Yes Nvidia our way of the highway answer is in fact incorrect because for application support. Like it or not we need some way with opengl and egl to support applications using implicit sync.

          The Linux kernel development requires consideration. The major Linux kernel developers like Intel, AMD, Redhat... all understand basic rule from Linus Torvalds you don't break userspace. That break does not equal that you cannot have poor performance doing X option just if X option was implement it must remain implemented using a legacy interface poor performance is acceptable but that interface not working at all is not acceptable.

          I can give many examples where Intel, AMD, Redhat and other parties submit to Linux don't get exactly what they want and have to implement something they know performs badly and some cases go out of their way to make it perform badly. Reality is this is fine not everything implement in the Linux kernel has to be perfectly fast. Nvidia would be in their rights to make their implicit sync run badly to encourage new programs to use the explicit sync interfaces. The problem is legacy applications still have to be supported. People set up chroots and containers of really old Linux systems to run very old custom applications on Linux.

          Nvidia simple is failing to understand the problem space.

          Comment


          • #55
            Originally posted by MrCooper View Post

            Only one entry on the list is related to that: "XWayland does not provide a proper way to synchronization application rendering with presentation."

            What they mean is that Xwayland doesn't support explicit sync yet. What they don't mention is that implicit sync is perfectly adequate for Xwayland, see below.



            That is nonsense.

            You guys and many others seem to think of explicit sync as some kind of silver bullet which magically makes everything work better. It's not.

            As I explained in https://gitlab.freedesktop.org/xorg/...7#note_1358237 and following comments, explicit sync cannot provide any performance benefit over implicit sync for the specific use case of sharing buffers between a display server and direct rendering clients. It boils down to the same fences in both cases, the difference is merely those fences being communicated explicitly as part of the display protocol vs implicitly via a separate channel.

            Anyway, Nvidia are free to solve this via explicit sync, but they'll need to do the work.
            By half i more meant most important ones "XWayland does not provide a proper way to synchronization application rendering with presentation" is by far biggest one.

            2nd biggest one is probably issues with sharing screens (that also largely originate from that).

            A lot of smaller issues are way too small or have workarounds like indirect GLX aren't on by default so if you need to enable it you can as well manually return to native X for your use case. If Xwayland would have proper synchronization AND sharing screens would be working i wouldn't mind swapping to Wayland on Nvidia with all remaining issues.

            In this specific case explicit sync makes things simpler (not particulary big diffrence in performance). The issue is in Vulkan (Ekstrand's article "chicken and egg problem") when you either have oversynchronization or .... tons of issues like failing sharing screen. Erik/Cubanismo (Nvidia) wants explicit sync, Erik mentioned he could do some work, Ekstrand (when he was in Intel) wants that and made some patches (after 2 years still not merged and still work in progress), Marek Olsak (AMD) also made some RFC towards that going entirly in explicit sync direction. The issue is those things in open source world take sooo long that if those things are ready by ubuntu LTS 24.04, I would consider it success.

            Comment


            • #56
              Originally posted by oiaohm View Post

              No you miss a key point go read the Opengl and EGL standards. Guess what implicit sync is required to be Opengl and EGL standard conforming. Glamor does nothing that the opengl standard does not say is permitted.
              Yes I know that OpenGL and EGL are implicit sync (thats also one of the reasons why arguably the API's are so dated and bad by modern standards but thats a separate topic). However what I said earlier still applies, which is that you can implement implicit sync over explicit sync.

              And to be clear, this problem of having persuasive implicit sync in the linux kernel/graphics stack is causing other issues, i.e. with Vulkan (which is an explicit sync API).

              Comment


              • #57
                Originally posted by wertigon View Post

                Your point is completely proving hos point. Nvidia is not cooperating and taking the "My way or the highway" approach.
                If they are unable to co-operate on Linux's terms then thats Linux's fault, not NVidia's. This kind of reasoning is also hilarious, because Linux and the graphics stack are the ones that are shoehorning a technically incorrect solution (so much for Linux being about upholding technical excellence).

                I am not saying that NVidia doesn't have part of the responsibility here, they definitely could have done some things better. However this attitude of blaming everything on NVidia is what immature teenagers do.

                Comment


                • #58
                  Originally posted by oiaohm View Post

                  No you miss a key point go read the Opengl and EGL standards. Guess what implicit sync is required to be Opengl and EGL standard conforming. Glamor does nothing that the opengl standard does not say is permitted.

                  [/B]

                  This is really a direct admit that Nvidia driver implementation is not standard conforming. Intel and AMD have dma-fence that is explicit sync but they have also implemented implicit sync on top to keep there implementation conforming.

                  Something remember glamor is a implement as standard conforming opengl/egl application. The features that a breaking glamor happen to be in the official opengl test suite that you are meant to pass to claim opengl compatibility.



                  Yes AMD and Intel developers are wanting to move to more explicit sync but amd and intel have proposed leaving a system for implicit sync when it has to be used. This is one AMD alteration mentions coexisting with implicit sync. This is different to the Nvidia solution were we break standard conformance because we can.



                  The reason is in fact standard conformance so old applications do in fact work right. Remember I said Nvidia does not pass the opengl test suite properly on any platform.

                  This Nvidia our way or the highway is the cause for a lot games to need Nvidia particular code to work on Nvidia GPUs.

                  The reality of the Nvidia mess is we may need Zink on Nvidia cards in future to have implicit sync so legacy opengl/egl applications work right.



                  This conclusion effectively ignored the opengl and egl standards and legacy applications usage that required implicit sync. The reality is even if you covert as much a possible to explicit sync you still need implicit sync to be standard conforming with Opengl and EGL for applications and you need to for legacy Linux applications. Remember when nvidia proposed eglstreams for wayland their idea was not to support any X11 applications under Wayland so avoiding the legacy application problem and that idea was shot down very early in wayland development..

                  Linux 5.19 To "Make Life Miserable" In Slowing Down Bad Behaving Split-Lock Apps Take this recent patch that has been talked about it was first proposed kill applications using Split-Locks that would break the Linux kernel ABI userspace promise so that cannot be done even that Intel developer would love to. So next option is make using that perform badly still be compatible but make the other paths the better performing route this keep the ABI userspace promise as legacy applications will still run just not perform well.

                  Now you say implicit sync performs badly that is not in fact a problem and would be in fact fine. Yes having horrible overhead because you choose to use implicit sync would also be fine to meet the requirement. If Nvidia support implicit sync and it perform is horrible as that would allow legacy applications still to run and not break the ABI userspace promise of the Linux kernel and be opengl and egl standard conforming as neither standard says implicit sync has to perform well just that it has to work..

                  mdedetrich maintaining ABI compatibility and being correctly standard conforming at times comes with the horrible problem of poor performance this is just the nature of the beast. Yes to pass the opengl conformance test suite the implicit sync support does not need to be the fast code path.

                  Yes Nvidia our way of the highway answer is in fact incorrect because for application support. Like it or not we need some way with opengl and egl to support applications using implicit sync.

                  The Linux kernel development requires consideration. The major Linux kernel developers like Intel, AMD, Redhat... all understand basic rule from Linus Torvalds you don't break userspace. That break does not equal that you cannot have poor performance doing X option just if X option was implement it must remain implemented using a legacy interface poor performance is acceptable but that interface not working at all is not acceptable.

                  I can give many examples where Intel, AMD, Redhat and other parties submit to Linux don't get exactly what they want and have to implement something they know performs badly and some cases go out of their way to make it perform badly. Reality is this is fine not everything implement in the Linux kernel has to be perfectly fast. Nvidia would be in their rights to make their implicit sync run badly to encourage new programs to use the explicit sync interfaces. The problem is legacy applications still have to be supported. People set up chroots and containers of really old Linux systems to run very old custom applications on Linux.

                  Nvidia simple is failing to understand the problem space.
                  1st. I don't know any GPU driver that totally passed OpenGL comformance test 100% on every still supported GPU on any major platform. There isn't one. Unless you mean core non optional features, then Nvidia passes it. In fact OpenGL comformance test orginally was submitted by Nvidia Dolphin and former Ensemble studios develeper also said straight away that Nvidia has the most extensions out of all OpenGL drivers and are practically working the best.

                  2nd. here we talk about userspace stack that isn't applying to every change. Changing how glamor/X/Wayland works (to be more explicit sync) won't make doom, quake or Blender refuse to work. There is no "Breaking compability changes" in what Nvidia proposes. The entire problem is "So we're not really proposing some radical change to the extension's design. The problem is just that this synchronization mechanism uses regular XSync fences and there's not really any way to trigger one of those asynchronously after we receive a notification from the GPU that it's finished rendering."

                  3rd. Opengl somehow doesn't have that problem when working nativly under X, it does only under Wayland. Thing is again you can implement implicit sync in explicit sync so explicit sync driver doesn't make anyhow true user space stuff refuse to work. This is how Android does it, and this is how Windows does it and this is how DXVK or Zink does it.

                  4th. If you want to develop so low-level things with bad performance, you basicly want to discredit entire OS as alternative largely. We heard from Nvidia that performance drop would be 30%. Also the more forward we go the more multi-threaded/multi-queue things and the more oversynchronization issue will be. So current 30% issue will be growing to be larger penalty.

                  5th. For gamers 30% performance loss of today is not acceptable.

                  6th. Present is already supporting explicit sync. So complete redesign is not needed.


                  I would also mention, if Intel/Nvidia/AMD drivers by developers of Opengl developers like Dolphin are rated as at least "good" (with Nvidia being excellent) and mobile gpus like Andreno/ARM Mali being bad/horrible (and still being like that!) (PowerVR/Tegra being unknown) and yet somehow explicit sync android gpu stack actually works better with way more horrible drivers, with way less talented engineers then Nvidia/AMD/Intel. This i would say is true testament that probably Linux graphic stack has probably simply made wrong choices.
                  Last edited by piotrj3; 24 May 2022, 10:13 AM.

                  Comment


                  • #59
                    Originally posted by mdedetrich View Post
                    Yes I know that OpenGL and EGL are implicit sync (thats also one of the reasons why arguably the API's are so dated and bad by modern standards but thats a separate topic). However what I said earlier still applies, which is that you can implement implicit sync over explicit sync.

                    And to be clear, this problem of having persuasive implicit sync in the linux kernel/graphics stack is causing other issues, i.e. with Vulkan (which is an explicit sync API).
                    Sections of Vulkan WSI expects implicit sync by standard. So depending on what section of Vulkan depends on what sync is required. Vulkan is not a total explicit sync API.

                    The reality when it comes to opengl and egl where implicit sync is required Nvidia has not implemented implicit sync over explicit sync even if they had done it userspace this would be all that is required to pass the opengl testsuite. This is why glamor and a stack of old applications have graphical artefacts when using Nvidia because something that is required to be correctly to opengl standard Nvidia does not support.

                    Originally posted by mdedetrich View Post
                    If they are unable to co-operate on Linux's terms then thats Linux's fault, not NVidia's. This kind of reasoning is also hilarious, because Linux and the graphics stack are the ones that are shoehorning a technically incorrect solution (so much for Linux being about upholding technical excellence).

                    I am not saying that NVidia doesn't have part of the responsibility here, they definitely could have done some things better. However this attitude of blaming everything on NVidia is what immature teenagers do.
                    Sorry Nvidia uses the claim of technically correct to pull NIH and ignore standards. Remember all other GPU vendors are able to work inside Linux terms Nvidia is the odd one out. Why should Nvidia get special treatment here and be allowed to break compatibility with applications.

                    mdedetrich the reality here the Linux kernel developers work to a set of rules that do not break. Please note Nvidia run into the same problem with Microsoft and Windows Vista. Did Microsoft change what they had done to the driver stack to make Nvidia happy in the end the answer was no. Instead Nvidia had to toe Microsoft line.

                    The curse of stable userspace ABI is that if you make a technically incorrect mistake you are stuck with it. There are a lot of technically incorrect things in the Linux kernel. There is even a buffer overflow that has to be supported in one section of the Linux kernel because a particular userspace bit of software expected it and at one point of time it worked so it must remain working.

                    The rules of Linux kernel development have been stable for over 25 years now. By the way the arguement that if Nvidia cannot work with the Linux kernel that its the Linux kernel fault is wrong. The Linux kernel development has not been changing its rules or terms. Nvidia has not wanted to know them.

                    Linux kernel developers have been very tollerate really. Apple Nvidia pulls the same stunts guess what happens Apple blocked loading Nvidia drivers completely and no longer supported the hardware. Maybe this is what the Linux kernel should do???

                    mdedetrich blocking loaded drivers from using GPL symbols if they are not GPL licensed is a shot across Nvidia and others not working with the kernel. UEFI secure boot can block loading all drivers the Distribution does not provide.

                    Linux kernel developers do have the option of the scorched earth on Nvidia the reality by saying its the Linux kernel fault that Nvidia cannot work them the suggest they have not pushed hard enough and should use the scorched earth options. The Linux kernel developers have been hoping Nvidia would come into line before they run out of tolerance.

                    Comment


                    • #60
                      Originally posted by mdedetrich View Post

                      If they are unable to co-operate on Linux's terms then thats Linux's fault, not NVidia's. This kind of reasoning is also hilarious, because Linux and the graphics stack are the ones that are shoehorning a technically incorrect solution (so much for Linux being about upholding technical excellence).

                      I am not saying that NVidia doesn't have part of the responsibility here, they definitely could have done some things better. However this attitude of blaming everything on NVidia is what immature teenagers do.
                      But... Nvidia isn't cooperating on Linux's terms. It's trying to "cooperate" on their own terms and failing miserably at it.

                      Look, Nvidia is trying to implement git in an organisation that currently runs subversion. All internal processes are built to support subversion, all the CI tooling, all the editor support, et cetera et cetera et cetera.

                      Sure, git is the far superior versioning system in general. Do you want to be the employee that would push the git path, even though knowing that path would cost millions of dollars in retraining, retooling and opportunity costs? If you refuse to use anything other than git here, how long do you think it will take before you lose your job?

                      There is no difference with what Nvidia is doing, and the above scenario. Well, except that it is not possible to fire said employee due to being the child of one of the C[OETF]Os.

                      Also, have a look at https://mesamatrix.net/ for conformance. All of radeonsi, mesa, zink, llvmpipe and i965 has 100% compatibility. Nvidia is not even fully compliant with OpenGL 4.6.

                      Comment

                      Working...
                      X