Announcement

**mdedetrich** · 23 May 2022, 06:00 AM

Originally posted by oiaohm View Post

Yes Nvidia might want that but you did not read what you quoted carefully enough.

Would have paid to read that LWN Article closer.

Vulkan yes the standard allows you to implement a driver without implicit until you interface with the OS window system this is not a Linux only limitation this is a limitation on all platforms where Vulkan is implemented. That LWN write up as not that clear.

Opengl and EGL that Wayland compositor use don't have a properly agreed on method for anything other than Implicit. The reality is Nvidia needs to implement Implicit to conform with what the opengl and EGL standards say.

EGLStreams was push up to Khronos group and basically not taken on board. Vulkan that has implicit sync when dealing between the host OS and application has been taken on board.

https://www.kernel.org/doc/html/v5.9...tml#dma-fences

Does DMA-BUF support support Explicit sync the answer is yes it does with the sync-file system this existed when DMA-BUF was first created. Yes the lwn writeup mentions this.

Yes there are EGL and OpenGL extentions to use DMA-BUF with sync-files.

The only change to DMA-BUF is add the ability to put the sync-file in the DMA-BUF metadata with call to added to sync-file to the meta data and call to abstract it from the metadata. Yes that change removed the need for the Wayland protocol to ship around sync-files. Yes another possible change would be making Implicit Fence Poll work correctly with a sync-file connected DMA-BUF so that legacy compositors and applications cope with being handed Explicit sync metadata DMA-BUF by having it act like Implicit sync if that function is used.

Being tear free you still need to resolve the sync when you get to KMS so this being implicit sync or explicit sync makes limited difference because this is a sync resolve point.

mdedetrich something to remember DMA-BUF comes out of joint work between Intel and Nvidia over PRIME support so there is no missing core functionality here the ABI many need a little modification to be more user-friendly. That is the hard point DMA-BUF has always had Explicit sync and Implicit sync.

Eglstreams going it own way never made much sense since in shared output GPU setups Nvidia driver had to deal with DMA-BUF because it part of prime to use the output connected by Intel or AMD on Linux.

The major reason why Nvidia wanted Eglstreams was not to solve the explicit sync problem but to keep as much of the internals of their driver secret as possible.

The feature difference between KMS/DMA-BUF and Eglstreams solutions are not major. You do run into problems with Eglstreams in cases where you must have Implicit sync and everywhere is fairly much coded Explicit only this is where Vulkan was smart with particular areas being implicit sync. Remember your legacy opengl and EGL applications don't know Explicit sync at all and even your modern wayland applications when dealing with the platform can be expecting implicit sync.

The horrible reality is GPU drivers or GPU stack need to implement support for both implicit sync and explicit sync. Yes its really simple to miss that opengl, egl and vulkan mandate at least some implicit sync support. This is not wayland protocol problem. This is existing graphical stack standards problem. Yes this is very much should have been Khronos group problem this as requires all GPU vendors to come to a proper agreement. Eglstreams from Nvidia proves this cannot be brute forced by them. mdedetrich the fact this is having to be sorted out in the Mesa mailing, LWN and so on is really a failure of the Khronos group to come to properly resolved solution.

Yes there are particular problems that are not Wayland protocol but some people will attempt to claim are Wayland protocol problems. Wayland protocol removes means to not resolve them properly. Yes some of the horrible issues you run into with open-gl applications and nvidia on Windows and Linux is caused by the non conformance to standard over this sync problem.

Yes I would like to see this stuff with sync sorted out properly. Yes Nvidia, Intel, AMD, Arm .... developers all sitting in a room possible locked in there until they have come up with the properly aggreed standard modifications that they will all agree to implement so this is no longer a mess. Yes getting everyone on DMA-BUF does get us one step closer to properly resolving this sync problem. But there are still a lot of problems with Opengl and EGL with it.

Again you post an essay when its not required.

The matter of fact is this, implicit sync is shit and Linux is forcing it for largely historical dogmatic reasons is what is causing the major problem in the Linux graphics ecosystem here. The fact that you are echoing the sentiment that "you need to implement implicit sync to be part of the stack" is what the whole problem is, implicit sync shouldn't even exist! (or more accurately the default should be explicit sync since you can implement implicit sync over explicit sync but not the other way around without completely killing performance).

When I actually read and researched the whole topic I was shocked to see how antiquated the mentality is behind the Linux devs in the graphics stack especially with comments such as https://gitlab.freedesktop.org/xorg/...7#note_1358237. How the Linux graphics stack is digging the heals on a model that is basically from the 90's and no longer works with how concurrent both CPU's and GPU's are today. Even in my own job where have to write multithreaded concurrent code, no one uses implicit synchronization because its a terrible abstraction if you actually care about performance because of the excessive CPU locking you need to do to make any sense of it.

Ultimately, the model of having everything as "simple input/output buffers on a file" does not solve every problem and it definitely should be moved away from which Jason Ekstrand and graphics driver maintainers both from Intel and AMD rightly pointed out. Fix that problem and then NVidia's driver will work without issues.

**MrCooper** · 23 May 2022, 09:17 AM

Originally posted by piotrj3 View Post

Half of Wayland's issues you listed are not Nvidia's fault. They are fault that simply a lot of linux stack is implicit sync made and nvidia driver is explicit sync.

Only one entry on the list is related to that: "XWayland does not provide a proper way to synchronization application rendering with presentation."

What they mean is that Xwayland doesn't support explicit sync yet. What they don't mention is that implicit sync is perfectly adequate for Xwayland, see below.

Originally posted by mdedetrich View Post

The matter of fact is this, implicit sync is shit and Linux is forcing it for largely historical dogmatic reasons is what is causing the major problem in the Linux graphics ecosystem here. The fact that you are echoing the sentiment that "you need to implement implicit sync to be part of the stack" is what the whole problem is, implicit sync shouldn't even exist! (or more accurately the default should be explicit sync since you can implement implicit sync over explicit sync but not the other way around without completely killing performance).

That is nonsense.

You guys and many others seem to think of explicit sync as some kind of silver bullet which magically makes everything work better. It's not.

As I explained in https://gitlab.freedesktop.org/xorg/...7#note_1358237 and following comments, explicit sync cannot provide any performance benefit over implicit sync for the specific use case of sharing buffers between a display server and direct rendering clients. It boils down to the same fences in both cases, the difference is merely those fences being communicated explicitly as part of the display protocol vs implicitly via a separate channel.

Anyway, Nvidia are free to solve this via explicit sync, but they'll need to do the work.

**mdedetrich** · 23 May 2022, 09:28 AM

Originally posted by MrCooper View Post

That is nonsense.

You guys and many others seem to think of explicit sync as some kind of silver bullet which magically makes everything work better. It's not.

As I explained in https://gitlab.freedesktop.org/xorg/...7#note_1358237 and following comments, explicit sync cannot provide any performance benefit over implicit sync for the specific use case of sharing buffers between a display server and direct rendering clients. It boils down to the same fences in both cases, the difference is merely those fences being communicated explicitly as part of the display protocol vs implicitly via a separate channel.

Sure and was stated in that thread many times with that specific problem the issue is not around performance but more that with the amount of effort that is required to get implicit sync working with workarounds you may as well just do it properly via explicit sync.

Of course its correct that explicit sync doesn't always provide performance improvements (or more accurately there is a performance improvement but its in an area where there isn't a bottleneck so its not noticeable) but the core point here is that if you spend so much effort to get an outdated model working why not just do it properly instead?

Originally posted by MrCooper View Post

Anyway, Nvidia are free to solve this via explicit sync, but they'll need to do the work.

NVidia aren't the gatekeepers for the Linux graphics stack and they have been quite vocal about this issue for some time and they routinely got ignored until other graphics vendors also started complaining about the same problem. As I am sure you are aware, there are many discussions on mailing lists that got nowhere on this topic in the past so putting all of the "blame" on NVidia is hardly constructive.

Creating a plan on how to move forward to explicit sync (which is not something that only NVidia is demanding) is where the focus should be rather than piling on workarounds.

**oiaohm** · 23 May 2022, 10:48 AM

Originally posted by mdedetrich View Post

The matter of fact is this, implicit sync is shit and Linux is forcing it for largely historical dogmatic reasons is what is causing the major problem in the Linux graphics ecosystem here. The fact that you are echoing the sentiment that "you need to implement implicit sync to be part of the stack" is what the whole problem is, implicit sync shouldn't even exist! (or more accurately the default should be explicit sync since you can implement implicit sync over explicit sync but not the other way around without completely killing performance).

mdebetrich the implicit sync of dma-buf is implement on top of the dma-fences system that happens to be explicit sync. Dma-buf contains both explicit sync and implicit sync. The interface to use the explicit sync have not been the best. So the Linux graphic stack is not 100% wrong in most cases.

The standards opengl, egl, and Vulkan you need to implement implicit sync some how. The existing Linux kernel userspace ABI remember Linus Torvalds rule you don't break user space so any exported syscall has to remain working.

Originally posted by mdedetrich View Post

NVidia aren't the gatekeepers for the Linux graphics stack and they have been quite vocal about this issue for some time and they routinely got ignored until other graphics vendors also started complaining about the same problem.

That is the correct Nvidia is not the gatekeepers yet with eglstreams they attempted to act like it. To change core OS structures of Linux you need to play by the rules. This includes working out how not to break the backwards compatibility.

Originally posted by mdedetrich View Post

How the Linux graphics stack is digging the heals on a model that is basically from the 90's and no longer works with how concurrent both CPU's and GPU's are today. Even in my own job where have to write multithreaded concurrent code, no one uses implicit synchronization because its a terrible abstraction if you actually care about performance because of the excessive CPU locking you need to do to make any sense of it.

But people still run games that bought recent from valve that the binary was coded in the 90s.

Like it or not both implicit sync and explicit sync need to exist. Do note it possible to implement you explicit sync in ways that you cannot do implicit sync on top of it. Nvidia managed to create this beast with their eglstreams implementation in places you had a explicit sync with no possibility to create implicit sync if that is what you really needed. Yes for a long time it was thought that if you had explicit sync you could always implement implicit sync but Nvidia developers with eglstreams managed to prove that wrong its one of the problems the Nvidia developer making the KDE backend for Wayland run straight into.

So the require is that how ever you implement explicit sync that the implicit sync must still be possible because Linux kernel syscalls exist that need this and opengl and egl and vulkan in the standards also need this.

Yes Nvidia been vocal about the problem but they are doing NIH "not invented here so we don't need to care about it." This results in them being ignored because what they are putting up is not a solution in the problem space.

The problem space of the Linux graphic stack says you will implement implicit sync. DMA-BUF in fact implements both explicit sync and implicit sync. Yes implicit sync implemented on top of the dma fences that is in fact explicit sync.

I am not saying there are not design choice errors like DMA-BUF design choice that you have to use two file handles from user space to use explicit sync yet only one when using implicit sync. Yes that is something that could be changed.

The fact that DMA-BUF implicit is implemented on top of a true explicit sync means it does not have a very bad overhead at all. This is also the problem the arguement against implicit sync that is bad for performance kind of does not work when the performance problem does not exist. Remember with implicit sync if a sync is not possible on something the moment cpu slice is returned to the process with error/warning and the process can do something else.

Now a true explicit sync you cannot get the sync right now you can lose the slice to another process. Sorry explicit vs implicit sync is not that simple. There are times compositors want explicit sync other times they want implicit sync.

**Azrael5** · 23 May 2022, 02:16 PM

Why doesn't the graphical stack on linux switch to explicit sync? Linux has to pursue the continuous improvement. I understand that many linux developers wait for companies realize conformant drivers but also linux operators have to adequate linux software in order to maximize hardware capabilities.

Edit: the last oiaohm message is eloquent enough. Both implicit and explicit have to coexist. If this coexistence is guaranteed no problem should occur.

**wertigon** · 23 May 2022, 06:35 PM

Originally posted by zexelon View Post

Also to put it bluntly there is no performance comparison from team red (aka AMD) that reaches the levels achieved by Nvidia in this generation, even on Linux!

... Whut? Are you telling me the latest benchmarks Phoronix posted where a 6600 XT beats the snot out of the 3060, a 6700 XT is neck-and-neck with the 3070 / 2080 Ti, and a 6800 XT is on par with the 3080 Ti (all cards at or under the same price as their counterpart) are not competitive? Do you even read the benchmarks before trying to spread easily debunked falsehoods?

Now, it is possible Nvidia is the only real performant solution in *your* niche case. But that certainly doesn't seem to be the case for Linux at large.

**oiaohm** · 23 May 2022, 07:14 PM

Originally posted by Azrael5 View Post

Why doesn't the graphical stack on linux switch to explicit sync? Linux has to pursue the continuous improvement. I understand that many linux developers wait for companies realize conformant drivers but also linux operators have to adequate linux software in order to maximize hardware capabilities.

Edit: the last oiaohm message is eloquent enough. Both implicit and explicit have to coexist. If this coexistence is guaranteed no problem should occur.

The problem is the coexist is required. Explicit sync is possible to be implemented where you cannot do implicit sync. There are versions of Explicit sync that you cannot implement implicit sync poling on top of. dma_fences that you find on DMA-BUF is able to be both.

The reality lot of the Linux kernel sync stuff is both Explicit and Implicit and there are different areas that are pure implicit and pure Explicit.

KMS is implicit and it makes sense to a point.

Access Denied

https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-mm.html#gem-initialization

In a KMS configuration, drivers need to allocate and initialize a command ring buffer following core GEM initialization if required by the hardware.

Yes commands going to KMS on Linux is going on a ring buffer. So when you send command kms to linux kernel is not the time when that command is going to be processed all the time.

Some areas Explicit sync does not make the most sense due to ring buffers and other things adding delays. Some areas Implicit sync has simpler to use interfaces from the Linux kernel than the Explicit sync this has been true with DMA-BUF. Remember just because DMA-BUF made it harder todo Explicit sync the Nvidia eglstreams developers made out it did not support it at all this is non constructive criticism we are starting to see debates around DMA-BUF that are constructive criticism look what is really offers and what the real problem is.

Linux kernel is not alone using a mix match of implicit and explicit. Openmp has "#pragma omp parallel" that has implicit sync and "#pragma omp barrier "that is Explicit sync and "#pragma omp parallel nowait" that is no sync.

Sync is not a one size fits all problem. With the Linux kernel mainline there restrictions on what types of sync can be used based on what been exported by syscalls and what is define in standards. Nvidia doing their own thing have been able to ignore the restrictions this has resulted in their driver not being standard conforming at times and asking for things the Linux kernel developers are not going to do ever.

Lot of this is Nvidia NIH problem the stuff not invented at Nvidia has not been taken into account how this limits what can be implemented so Nvidia proposes over and over again stuff that simple just cannot be done in the Linux kernel space. Also what Nvidia claims to be the problem in a lot of cases is not. Like the claim dma-buf does not have explicit sync when you look closely it does have explicit sync behind a horrible interface to user-space that could be improved and the improvement can be done without breaking the existing users.

Of course then when the Linux kernel developers refuse Nvidia suggestion because it not workable because it will break existing user they are not listening to Nvidia from Nvidia NIH point of view. See problem here. Then when other GPU vendors hit the same problem start considering possible workable solutions then you have people saying Nvidia was complaining about X problem and the Linux kernel never fixed it so Nvidia was in the right when in reality Nvidia never proposed a workable solution to the problem. Linux kernel could not fix it at the time at the time Nvidia wanted because Nvidia never proposed a workable solution because of the Nvidia NIH problem.

Hopefully with Nvidia open sourcing their kernel their developers at kernel level will have less NIH problem and able to propose workable solutions instead of non workable solutions and Linux kernel development will be able to get somewhere from interactions with Nvidia.

The reality here Nvidia for a long time with the Linux kernel has been no different than a spoilt brat demanding to break other people toys and expecting to get away with it then sulking because they get blocked from doing that. Yes of course complaining when their toy gets broken because what they are doing with it are going to break other people toys as well. So hopefully open sourcing the kernel driver is signs they are growing out of their spoilt brat stage. Working cooperatively has not been Nvidia strong point. Linux kernel development really does depend on working cooperatively the other GPU vendors have successfully done it for at least a decade.

Yes Nvidia lost GPU deals with all the major console makers because they could not work cooperatively so its not just Linux kernel that Nvidia been a spoilt brat with.

Nvidia problem with the Linux kernel is not a chicken and egg problem its been a failure to work cooperatively on the Nvidia side. The Linux kernel development collection of cats(being the different interest groups/ vendors and that they are about as hurd able at cats) requires proper considerations and makes working cooperatively a serous effort not just propose idea and walk away. Instead its propose idea listen to the feedback and alter as required.

**birdie** · 23 May 2022, 07:23 PM

Originally posted by oiaohm View Post

Yes Nvidia lost GPU deals with all the major console makers because they could not work cooperatively so its not just Linux kernel that Nvidia been a spoilt brat with.

Steam deck? NVIDIA doesn't have an x86 license. Sony PS/XBox? The same situation. Nintendo Switch is based on NVIDIA tech.

What else? Pure speculation and rumors?

Lastly, Xbox and Sony PS both are locked down as hell and have closed sourced AMD drivers.

So much for NVIDIA not cooperating with Linux kernel development and not having open source drivers. Hint: barely anyone cares.

**zexelon** · 23 May 2022, 08:10 PM

Originally posted by wertigon View Post

... Whut? Are you telling me the latest benchmarks Phoronix posted where a 6600 XT beats the snot out of the 3060, a 6700 XT is neck-and-neck with the 3070 / 2080 Ti, and a 6800 XT is on par with the 3080 Ti (all cards at or under the same price as their counterpart) are not competitive? Do you even read the benchmarks before trying to spread easily debunked falsehoods?

Now, it is possible Nvidia is the only real performant solution in *your* niche case. But that certainly doesn't seem to be the case for Linux at large.

I was more referring to absolute performance... currently nothing touches the 3090. Now when referring to $$$ performance yeeeaaahhh in no way am I going to try to defend Nvidia. Also yes, I have a requirement for the absolute performance of the 3090 (and its not gaming) so its a price to be paid. That said, i have been an AMD user when they were the dominant GPU (back in the ATI days) and I look forward to a day when they can get back into the top teir ring, also if they can finish what they started with ROCm they will have a strong chance at returning to the top.

**oiaohm** · 23 May 2022, 09:25 PM

Originally posted by birdie View Post

Steam deck? NVIDIA doesn't have an x86 license. Sony PS/XBox? The same situation. Nintendo Switch is based on NVIDIA tech.

What else? Pure speculation and rumors?

Lastly, Xbox and Sony PS both are locked down as hell and have closed sourced AMD drivers.

So much for NVIDIA not cooperating with Linux kernel development and not having open source drivers. Hint: barely anyone cares.

No this is a failure to know the history. Nvidia lost GPU with Microsoft Xbox line with the Xbox 360 that had a powerpc processor yes ATI won the contract because they would share the source code Microsoft and ATI both stated that fact. This is was near the start ATI process to considering publicly open sourcing their driver before getting acquired by AMD yes this is about 3 years before ATI gets acquired by AMD yes Xbox 360 released in 2005 starts developing in ~2003 and ATI gets acquired by AMD 2006. So the X86 license did not come into this change just later on make it harder for Nvidia to get back this lost market share. Yes even today the driver used inside the xbox systems has agreement between Microsoft and AMD to allow Microsoft access to the full source code of the drivers to audit them.

The current Sony Playstation AMD graphics driver is not pure closed source surprise to surprise there is a freebsd based kernel inside there with the amdgpu MIT licensed driver the same one as in the Linux kernel. So open source kernel space with closed source userspace with a locked down bootloader that is really what the current play-station 4 is.

Birdie you are making a locked down system you don't want to wake up and find due to some third party driver that you cannot audit that you have a kernel level exploit right.

Not having open source drivers at least the kernel space part has been costing Nvidia sales in particular areas and this is not a new problem. More parties are focused on security the more important auditing kernel level drivers comes and the more source code access will be demanded. Yes areas where Nvidia has suitable competition from either AMD or Intel or some other vendor the question of is the driver open source and Audit-able does come a possible defining question if Nvidia win or loses the contract.

Please note Nvidia not cooperating happened with Vista leading to the 3 years of driver hell as well. Lack of cooperating is not just with the Linux kernel. Yes Nvidia also started out with Vista with Microsoft demanding they alter things without considering what Microsoft had implemented. AMD got the changes they wanted in that case and Nvidia has been forced to use that ever since and that was because AMD/ATI developers were cooperative with Microsoft and considered what Microsoft was trying to do and Nvidia was not.

There is the Apple example as well where Apple does not support Nvidia cards because Nvidia would not share kernel source.

Nvidia has internal problem like it or not maybe Nvidia will finally grow out of this problem. That internal problem results in Nvidia not working well with Microsoft or Linux or Apple or sony.... with OS development. One of the important things about OS development is being aware that other parties have interests and try to ask for things that don't break other parties requirements. Demanding straight up change does not work in OS development and it does not matter of the OS is open source or closed source. Nvidia behavour has been wrong with open source and closed source operating systems.

Announcement

NVIDIA's List Of Known Wayland Issues From SLI To VDPAU, VR & More

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment