NVIDIA's List Of Known Wayland Issues From SLI To VDPAU, VR & More

mdedetrich replied

27 May 2022, 06:01 AM
Originally posted by wertigon View Post

Dude! Patches fail regression tests! What more proof do you need?

Sigh, do you even know how to program or how software development works? Plenty of stuff submitted to the Linux kernel while its being worked on can fail tests. Thats normal software development process.

And failing regression tests are not breaking backwards compatibility, those are separate things.

Last edited by mdedetrich; 27 May 2022, 06:04 AM.
Leave a comment:
wertigon replied

27 May 2022, 05:16 AM
Originally posted by mdedetrich View Post

No you are just flat out lying. As has been stated (with evidence provided) many times, NVidia has never suggested a breaking of backwards compatibility and any stuff it would break is already so broken that no one is using it.

Just leave the discussion, its clear you don't know what you are talking about.

Dude! Patches fail regression tests! What more proof do you need?
Leave a comment:
oiaohm replied

26 May 2022, 08:24 PM
Originally posted by piotrj3 View Post

So that wrapper doesnt' give you information. It strips information that is very important in Vulkan context. The process you described in case of multiqueue devices and Vulkan in general is extremly complex. And this is entire thing, without hacks that OSS driver do, Vulkan is NOT compatible at all with DMA_BUF. And Vulkan is the userspace! The implicit sync is the one giant hack and implicit sync in general requires tons of hacks. Not explicit sync.

dma-buf: Add an API for exporting sync files (v12) [LWN.net]

https://lwn.net/Articles/859290/

This is not exactly hack this is expose sync file the explicit sync behind the implicit sync to user-space. This made DMABUF compatible with Vulkan without any
any over- synchronization .

Vulkan was designed for Userspace not for OS kernels. This is really like this history pthread solution that run in userspace as well. Same problem with explicit sync in graphical and the historic pthread implementations that run in userspace wasting CPU cycles and context switches on processes that cannot do anything due to what they ware waiting on not being ready.

The reality here is both implicit sync and explicit sync have their downside. What we need is the graphical equal of a futex. Futex you can use as implicit sync or explicit sync or hybrid. Remember a futex in best performing mode only using the syscall when the when the application needs to wait then the kernel only wakes the application up again when the process can proceed. Of course you code using futex functions in not best performing way and replicate implicit sync and explicit sync.

We have multi levels of NIH here. Nvidia and those who created vulkan want for OS neutral so they totally ignored what was required for best performance of the OS just instead focused on what was required for the GPU side. Since you are designing OS neutral you avoid anything that need a kernel modification. This end up designing something that does not ideally fit into the Linux kernel.

The reality here you need to stop ignoring the explicit sync downsides because otherwise you sound exactly like the people who said userspace pthread was perfect.

Implicit sync and explicit sync functionality is both required. Futex in your normal locking is basically both of those glued into one. So the OS scheduler can allocate CPU time to tasks that can proceed forwards kernel side the required implicit sync feature. So userspace code does not have to syscall/context switch all the time you need you need explicit sync as well. This is the problem you need both behaviours like it or not for operating system to perform well.

Originally posted by piotrj3 View Post

Vulkan (and explicit sync) guarantee that.

There is one very big problem with this claim vkfence sits straight on top of dma_fence structure that Linux kernel implicit sync uses and that is exposed by the file sync patch above. So in reality there is no difference in guarantee if you driver is implemented using the Linux kernel dma fence structures. Wait Nvidia cross platform driver they have done their own unique pile of crud here right.

There is a problem with attempt to make code too generic between OS. You end up coding in incompatible designs.

Originally posted by piotrj3 View Post

And because of that driver would have to deep inspect command buffer ahead of time, extract all those DMA_BUF fences, and based on them set right order of operation in each queue (effectivly finally making it explicit).

The open source drivers in the mainline kernel using the Linux kernel dma fence structure put them in the right order of operations right off the start line. This is a problem Nvidia is running into because they are trying to be as cross platform as possible in their driver code. So they need to covert from dma fence structs in the Linux kernel to their own and back. Nvidia has attempt to make their sync code way to generic. Notice AMD, Intel and everyone else is not doing these conversions their drivers are native DMABUF supporting. Yes these conversions of sync structures end up with a sync issue between sync structures so making your life hell.

This is the problem Nvidia is not right in their objective of explicit sync only because explicit sync itself is flawed. Implicit sync is like kernel based locking and userspace pthread is like explicit sync. Yes the defects of both are absolutely the same. Yes it took 2 decades for someone to come up with a futex that took the best of both for CPU only workloads.

Nvidia is not proposing graphical equal to a Futex. AMD and Intel and others are slowly working in the direction of a graphical futex. The correct solution will be some form of graphical futex for sync.
Leave a comment:
mdedetrich replied

26 May 2022, 06:21 PM
Originally posted by wertigon View Post

The fact that you think it is a lie means you have not understood the criticism and choose to believe people are lying over the idea that the issue is complex, multi-faceted and has no clear or easy answers. If it did, we would obviously already be well past this issue!

Fact of the matter is that applying the patches available means some regression tests now fail. I do not know the exact nature of these failing tests - my area of expertise is only vaguely related to this, so operating on second hand Intel here - but this means at least some backwards compat now fails.

Like I said earlier - If Nvidia needs this fixed, then Nvidia is the one that is free to drive it home. Assistance will be provided if asked for, collaboration and patches welcome. If Nvidia does not care about the 2% market share that makes the Linux Desktop then fine, we'll do just fine without Nvidia cards. After all, what are a couple of million sales more?

No you are just flat out lying. As has been stated (with evidence provided) many times, NVidia has never suggested a breaking of backwards compatibility and any stuff it would break is already so broken that no one is using it.

Just leave the discussion, its clear you don't know what you are talking about.
Leave a comment:
wertigon replied

26 May 2022, 06:13 PM
Originally posted by mdedetrich View Post

Can you stop lying please?

The fact that you think it is a lie means you have not understood the criticism and choose to believe people are lying over the idea that the issue is complex, multi-faceted and has no clear or easy answers. If it did, we would obviously already be well past this issue!

Fact of the matter is that applying the patches available means some regression tests now fail. I do not know the exact nature of these failing tests - my area of expertise is only vaguely related to this, so operating on second hand Intel here - but this means at least some backwards compat now fails.

Like I said earlier - If Nvidia needs this fixed, then Nvidia is the one that is free to drive it home. Assistance will be provided if asked for, collaboration and patches welcome. If Nvidia does not care about the 2% market share that makes the Linux Desktop then fine, we'll do just fine without Nvidia cards. After all, what are a couple of million sales more?
Leave a comment:
Azrael5 replied

26 May 2022, 02:25 PM
If this is the situation Nvidia users will stand with X11 until the rest of their life. The fruition of NVidia cards on X11 is not so bad. it's a regret because Wayland seemed to be a promising solution, but at this point I can qualify it as one of the biggest failures of linux developers. Likely it's the reason that many former linux end-users can state that Windows is still the best mass Os: linux has too much rigidity.
Leave a comment:
mdedetrich replied

26 May 2022, 12:23 PM
Originally posted by wertigon View Post

Again, blaming the kernel devs instead of finding solutions. You seem to be under the misconception that this is ideologically driven. It is not. Nvidias current proposals would break all of backwards compatibility. Which has been explained several times by now. Do it over. Do it proper. Else it will not be accepted.

Can you stop lying please?
Leave a comment:
piotrj3 replied

26 May 2022, 12:18 PM
Originally posted by oiaohm View Post

No there is a four options. AMD employee said move the core entirely to explicit sync but that same employee goes on and suggest a compatibility layer for legacy code.

The syscalls into the Linux kernel for implicit sync nothing about the Linux kernel stable ABI rules forbids those being I cripple performance if used. Yes the AMD employee who you would have read suggesting the complete core as explicated sync also suggest having a common shared subsystem for legacy implicit sync that emulating implicit sync on top of the core explicit sync.

The Intel and AMD suggested options both have implicit sync support by some means so not breaking the Linux kernel Stable ABI to userspace.

The option 1 is not backed by anyone. Nvidia and AMD ideal of move entirily to explicit sync are different. AMD one is that implicit sync will still be support but if used you are using a legacy subsystem that will result in not ideal performance. For glamor doing X11 2d hardware acceleration it being under performing is not a problem. Remember your major applications these days on Linux don't use 2d X11 drawing at all.

1) the old broken extremly oversynchronize no body wants. This supports implicit sync of course.
2) Intel progressive mirgration over time while keeping implicit support for legacy on top of explicit sync.
3) AMD move core completely to explicit sync keep a legacy subsystem for implicit sync so legacy syscalls and legacy code bases can work of course if used causing poor performance.
4) Nvidia no implicit sync at all pure explicit sync. Any application depending on implicit sync will have to be altered or their will have to be userspace hacks.

piotrj3 basically AMD and Intel are in kind of agreement here by some means implicit sync will work. If used does not have to perform well.

What if the Nvidia developer is looking in the wrong place.
https://01.org/linuxgraphics/gfx-doc...e-poll-support

Any application using implicit sync on a DMABUF will be performing a poll syscall command on that buffer. Yes the Nvidia developers right this is not by ioctl but by syscall where the signal comes from that a DMABUF is implicit sync. AMD developer noted that.

Then you go down to Reservation Objects. void dma_resv_init(struct dma_resv * obj) this also has to be called before you have implicit sync on a DMABUF. Then there is another command that free all dma_resv objects. No contents dma_resv buffer is implicit sync structure in fact.

The dma fence structure reference inside the dma_resv is explicit fence. Just a pollable explicit fence for current status. So in reality all amd and Intel drivers end up doing is processing the list of dma fences to cover all implicit and explicit fences. DMABUF implicit fence is in fact implemented on top of explicit fence structure all the mainline graphics drivers.

Remember what mdedetrich has written over and over again that implicit sync can be implemented on top of explicit sync. Yes the explicit sync file stuff the Linix kernel that is provided with DMABUF also uses the same dma fence sync explicit structure as the implicit sync of DMABUF uses. Yes android explicit sync uses the same dma fence structure system as the implicit sync does as well.

So why is the Nvidia developer wanting to know if dmabuf FD implicit or explicit sync since either way it process the same dma fence list from the deep driver side. The difference really is implicit sync the application poll the kernel and its cpu access can be cut off until the sync is ready. This is why implicit sync can dead lock applications if the sync never happens.

The fact that DMABUF done by AMD and Intel really is Implicit sync on Explicit sync. The results in the Implicit sync overhead being very light there is very little special processing for implicit sync. DMABUF Implicit sync that you have poll the kernel and the kernel can stop giving the application cpu time until the dma fence is ready. This can result in less wasted CPU cycles so a performance improvement in some cases.

Interesting catch right. Reality if driver is coded right for DMABUF it does not need to know for the hot path that implicit or explicit sync is being used by applications only that all the dma fences that have been allocated have been cleared.

DMABUF in it design did include the option of building with implicit sync disabled this was for particular devices that delays were a problem. Of course Nvidia seen that switch and gone hey lets switch implicit sync of then we don't have to implement the implicit sync exported functions. Of course they did not look properly at the AMD or Intel drivers mostly due to license to notice that implicit sync is just a wrapper on top of a explicit sync core. Yes DMABUF having explicit sync core was there from the first code of DMABUF.

Nvidia just add the implicit wrapper code to your driver. Yes still just process the explicit dma fences on your hotpaths and everything will work.

The fact DMABUF is implicit sync on top of explicit sync is why when it correctly done there is very little performance overhead. Yes magic you don't need to know what is a implicit buffer because everything is a explicit sync buffer with explicit sync structure just that pretends to be implicit buffer when need by having that implemented on top of the explicit sync structures..

Yes it is exactly as you write. And they could do that. But comes with issues in the future. Because world is a lot more complicated nowadays.

Yes, we could theoretically attach/consume implicit fences to buffers from userspace to mimic OSS driver behavior to some extent. I followed Jason's patch series to this effect, and we do some of this for synchronization in PRIME situations now, as it's vastly simpler when the only implicit synchronization boundary we care about is SwapBuffers() for consumption on a 3rd-party driver. It gets much harder to achieve correct implicit sync with arbitrary modern API usage (direct read/write of pixels in images in compute shaders, sparse APIs, etc.), and this has been a big pain point with Vulkan implementations in OSS drivers from my understanding. I don't know what the current state is, but I know it limited the featureset exposed in OSS Vulkan drivers when they first came out.

So that wrapper doesnt' give you information. It strips information that is very important in Vulkan context. The process you described in case of multiqueue devices and Vulkan in general is extremly complex. And this is entire thing, without hacks that OSS driver do, Vulkan is NOT compatible at all with DMA_BUF. And Vulkan is the userspace! The implicit sync is the one giant hack and implicit sync in general requires tons of hacks. Not explicit sync.

So you can make a choice of going more explicit sync which may.... but actually won't break userspace (because hint, the userspace they talk about is NOT CLIENT. It is a host not compatible. Which in this case aplies to very very limited use-cases that weren't working for other reasons with Nvidia). So you make breakage in already broken scenario.

This is basic issue of DMA_BUF - it is "like" file structure and implicit synchronization is like file structure.

Explicit sync is basicly queues with commands that set order of operations.

For single process, single queue that doesn't make much diffrence. This is why X can sort of use both under the hood. And this is why under this hood Nvidia doesn't quite care.

Problem is with multicore and multi-queue asynchronous nature. You have 2 queues. They sometimes read/write from DMA_BUF. Does information from DMA_BUF (eg is readable/writable) sufficient to guarantee same result. No because sometimes queue 1 might be faster and write content to DMA_BUF and claim "it is finished" while queue 2 will expect older content and sometimes other thing will happen. In reality you need to set right order of operations. Vulkan (and explicit sync) guarantee that. DMA_BUF don't. And because of that driver would have to deep inspect command buffer ahead of time, extract all those DMA_BUF fences, and based on them set right order of operation in each queue (effectivly finally making it explicit). Now do you want that complexity that is very error prone and performance costly and will continue to grow in complexity? No. So you ask Nvidia to implement it when in reality OSS stack could drop a lot of that burden if we moved away from it. It is win-win scenario for everyone and only potential user-space breakage is specifically host userspace, not client userspace (what we care about).

Thing is in fact NVidia doesnt' ask you to drop old codebase, it will work as fine as it did, nvidia only asks to add (what they will do) to add more explicit sync route as alternative and then in that case something from host (not client) requiring exclusivly on implicit sync might break that wouldn't work anyway with that NVidia driver.

Last edited by piotrj3; 26 May 2022, 12:40 PM.
Leave a comment:
MorrisS. replied

26 May 2022, 12:02 PM
I note that both the 2 factions have at least a main point in common in the confrontation. Implicit sync is for legacy.

Now, as it has been stated, Wayland has been developed with implicit sync criteria. Why? My personal opinion is that, independently of Nvidia, linux should move on explicit sync. Developers should switch the whole software, also if Nvidia didn't exist. And if it is true that Wayland has been written by implicit sync criteria I don't imagine how this big mistake could have been made. Surely, Linux developers cannot adapt the whole system to match Nvidia drivers though it would be the best solution just as Nvidia cannot adapt its driver to Linux condition. Cooperation apart, the only solution is that Nvidia makes an own Linux development team like Intel.

Surely, AMD and Intel development is linux conformant but has a defect it doesn't solicit linux to get better. My personal opinion is that the real difference is in the nature of the development: Nvidia realizes items aimed at the market, Linux developers realize an Oses for the enthusiasts. Linux developers are not interested in the change in the same way as Nvidia, if not solicited for new hardware and its features. If Wayland could have been developed in explicit sync criteria instead of implicit sync, the actual mistake is severe such as Nvidia has stuck over Eglstream for years. The mistake is severe because wayland is an occasion of modernization.

The best solution, if realizable, would be wayland+vulkan based on explicit sync eliminating opengl. The legacy stuff for legacy operating systems. I think that some courageous linux developers should write a new operating system.

Last edited by MorrisS.; 31 May 2022, 04:49 AM.
Leave a comment:
oiaohm replied

26 May 2022, 10:53 AM
Originally posted by MorrisS. View Post

A simple question: the switching in explicit sync would bring benefits to Linux in general? Could vulkan replace Opengl and Egl stack in explicit sync terms? I note that there is a structural issue: the implicit symc implies the rewritten of all drivers for Nvidia and the explicit sync implies the rewritten of the kernel, the graphical environment and several applications for linux developers. I think tat this problem cannot be solved.

There are a few things here.
1) AMD and Intel drivers are explicit sync at core.
2) Implicit sync is more than one thing.

Intel and AMD implicit sync with DMABUF is really implicit sync on top of explicit sync.

There is a big difference between implicit sync in the core of the driver and implicit sync on to kernel interfaces.

Why is across the kernel boundary from userspace to kernel space having implicit sync beneficial everyone argueing has not thought about. Think about the kernel allocates time slices being able to check the explicit sync status in kernel space as in DMABUF style implicit sync means that if the buffer is not in sync application can be skipped from being give its next time slice then that time slice can be possible give to the process that will make the buffer in sync.

Yes the downside of Linux kernel DMABUF implicit sync that you have performed a context switch to kernel and downside it can dead lock your application if the sync never resolves but its dead locked without being giving CPU time it cannot do anything with. Upside you are not wasting CPU slices in basically a spinlock checking explicit sync status because the implicit sync is really checking the explicit sync status kernel side by having something like DMABUF implicit sync.

The Linux kernel for applications to work only need implicit sync at the kernel interfaces to userspace to work not that the core driver is explicit sync.

The problem here Nvidia examples and arguments don't make sense because most of them are based on the idea that you implement implicit sync in the core of driver not implement implicit sync on top of the drivers explicit sync interfaces as everyone in the Mainline linux graphics stack does. Yes AMD and Intel want more of the core of drivers to be explicit sync but they are not talking about doing in the implicit sync from userspace to kernel. This is part of problem Nvidia is not a major CPU designer or OS designer. Reality here is both AMD and Intel have at different times made their own OS.

Yes just because Windows does not do something does not mean Linux does not want it.

You are right it absolute dead lock because explicit sync only that Nvidia is offering does not support legacy applications and in particular use case makes the CPU side of the CPU/GPU combination perform badly due to the allocation of time-slice problem.

The hard reality is on the CPU side implicit sync where the kernel is able to allocate time slices to processes that can progress forwards and bipass those that cannot does at times result in better performance and better CPU utilisation . Note Nvidia is tunnelled visioned on how they will get better GPU utilisation.

With a GPU/CPU sync for graphical combination we don't have something like a futex that can be processed both by kernel and user-space. Lot of ways I think from the userspace side that both implicit sync and explicit sync are wrong because both have the advantages in particular use cases neither is a absolute rounder. This is like general CPU locking before the futex.

Yes Nvidia problem here NIH they are not considering what the other parties need. I also think that AMD and Intel are also partly tunnelled version with the split between implicit sync and explicit sync result that something like a futex is not coming into existing for graphical usage that can be processed both by userspace and kernel space equally well so having most of the advantages of both as in reduced context switches from being userspace processed and having to spin in userspace looking for a sync to be resolved so cpu time slices allocated better..
Leave a comment:

Announcement

NVIDIA's List Of Known Wayland Issues From SLI To VDPAU, VR & More

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: