Announcement

**mdedetrich** · 25 November 2022, 05:38 AM

Originally posted by Quackdoc View Post

Vsync is a broad term, the vsync that wayland forces is triple buffer vsync specifically, which should only ever be a single frame of latency, not sure what the video is specifically talking about, but they are either talking about the typical vsync most games implement, the tested applications were implemented poorly, or the testing methedology is wrong

Triple buffer vsync is what old school games typically implement and its not just a single frame of latency. In fact if you look at one of the main justifications for Wayland adjusting their protocol to allow for screen tearing, its due to input latency from using touch based devices, directly quoting the article

For some use cases like games or drawing tablets it can make sense to reduce latency by accepting tearing with the use of asynchronous page flips. This global is a factory interface, allowing clients to inform which type of presentation the content of their surfaces is suitable for.

**piotrj3** · 28 November 2022, 06:54 AM

Originally posted by xfcemint View Post

I'm not going to agree with you completely on this one. The amount of latency that tripple buffering introduces depends on how well tripple buffering is implemented. The main problem is that it commonly doesn't get implemented good enough.

Another problem is that you also have to consider the 3D graphics pipeline, which can introduce additional latency.

In an idealy implemeted tripple buffering, the latency is always less than: frame rendering time + display device frame interval:
maxLatency < renderTime + frameInterval
... where the renderTime is measured from the moment when all input data (i.e. keyboard, mouse) has been produced (i.e. mouse movement, keypress) (this does not take into account the display framebuffer to LCD panel latency)

No it is not always less.

It is more in cases when have less frames per second comparing to refresh rate. In that case it goes miserably long latency is triple buffering is way to offset jumping (eg from 30fps to 60fps all time when refresh rate is 60hz). In that case you have full triple buffer latency that is inconsistent (so base latency + 2 frames).

People forget triple buffering does NOT imply how stuff works under the hood.

We have 60 hz monitor with triple buffering (so 2 back buffers) and overkill setup (way over 60fps) .

Frame A is in front.
Frame B and C are already in backbuffers.

Does in that time, we work on frame D. That is not defined.

If we work on frame D, it means we will discard frame B, what meant some work we did will be useless heater that never gets displayed. (this is fast sync/enhanced sync in Nvidia/AMD). You don't do that on compositors (Wayland hello!)

If we wait until frontbuffer swaps to B, to make space for frame D, it means we have full latency of 3 buffers. That is not good either.

If we end up fps <= refresh rate (or oscilating FPS around refresh rate), that is made more to prevent situations, frontbuffer is displaying, 1 backbuffer is ready, we cannot work on another backbuffer because buffers but loaded, so gpu doesn't work, while next frame has some microstutter that won't be ready in 16.6ms, meaning it will take 33.3ms window to display another frame and we get a sudden drop from 60fps to 30fps. Additional backbuffer improves better utilization of GPU,

**Quackdoc** · 28 November 2022, 11:08 AM

Originally posted by piotrj3 View Post

Frame A is in front.
Frame B and C are already in backbuffers.

Does in that time, we work on frame D. That is not defined.

If we work on frame D, it means we will discard frame B, what meant some work we did will be useless heater that never gets displayed. (this is fast sync/enhanced sync in Nvidia/AMD). You don't do that on compositors (Wayland hello!)

this is exactly how turning vsync off on wayland works. we don't care about wasted work, since the exact same work would have been done without vsync anyways. this is also how nearly all modern triple buffer vsync works.

If we wait until frontbuffer swaps to B, to make space for frame D, it means we have full latency of 3 buffers. That is not good either.

no idea if anything actually works like this, because this is stupid and it holds little to no benefit, but sounds like it would be detrimental

**Quackdoc** · 28 November 2022, 12:04 PM

Originally posted by xfcemint View Post

I have a question. (Sorry for interruption)

If you consider my post just above, you can see that I'm mentioning a "tripple buffering predictor".

Is anyone working on such a predictor in Wayland? I.e. is there any chance that such functionality could be added soon?

From my analysis it follows that a good predictor could decrease the average latency (while having some minimal disadvantages). Also, an option (slider) could be added to the predictor to adjust the "reserved time" difference between the moment when a frame has been rendered and the moment when the front framebuffer needs to be swapped. Increasing the "reserved time" decreases the chances of a wrong prediction by the predictor.

I am entirely unsure, it's possible but not something I would personally be interested in, so it would be something I would have skimmed over.

**piotrj3** · 28 November 2022, 12:33 PM

Originally posted by Quackdoc View Post

this is exactly how turning vsync off on wayland works. we don't care about wasted work, since the exact same work would have been done without vsync anyways. this is also how nearly all modern triple buffer vsync works.

no idea if anything actually works like this, because this is stupid and it holds little to no benefit, but sounds like it would be detrimental

This is exactly how older Directx stuff worked before flip presentation model happened. Before flip presentation model, you had settings like "maximum pre rendered frames" and by default it was set to 3. Games in direct9 couldn't drop frames. You could decrease it but if you had CPU outrunning GPU, you could queue up to 3 frames ahead. Vsync simply synchronized ready frame to display and made occasionally wait if frames weren't ready, but in directx before flip presentation model YOU ALWAYS had A -> B -> C. And that is because in old times framerates weren't that high. In moment Crysis 1 or AC1 was out, reviewers were saying 50 fps is amazing result. But what is important to know in directx case is if that GPU wasn't fast enough, then queue wasn't latency of 3 frames, as gpu was rendering slower then refresh rate 99% of time. Basicly it means first frame ready to send was sent and maximum 3 frames could be stored, but most of time they weren't as gpu was slower then refresh rate.

Windows Vista (and everything onwards) introduced Flip presentation model and that one allows dropping frames and displaying always latest finished. But by default this is how DWM works - it always displays latest finished frame so we have no tearing. But what DWM has (that wayland doesn't) is ability to automatically turn off those stuff whenever:
- borderless full screen aplication is on,
- full screen aplication is on,
At that point you have pretty much no composing so aplication and gpu driver are only ones in control of what happens.

And 2nd thing DWM has that Wayland doesn't is that DWM doesn't wait for slowest on desktop, so if your blender renders and lags, your desktop doesn't. It is being fixed and there is pull request for it, but it has some issues and waits for related work to be upstreamed to change that.

Also things like Sway and most Linux stuff, doesn't work by Vsync. Or rather they do, but just before interval to send frame, (max_render_time on sway) before sending frame, it starts own work, composes everything and after compose finish frame is prepared to send to display. This means composing overhead isn't big, most composers and DWM included does that automatically. So there isn't triple buffering on wayland it is double buffering and 2nd buffer is made just in time it is supposed to be sent to frontbuffer (or rather swapped).

This is good part. It isn't working by triple buffering, it just prepares frame when it is needed and doesn't introduce tearing in that and doesn't create too much work and latency is minimal.

So where is real problem. Aplication running on wayland (most of time) isn't aware of refresh time. So it cannot time create own frame just in time to also minimalize its latency so we could have a frame almost ready, but doesn't manage to be ready in Wayland-time. So wayland takes previous frame that is old and that old frame took time to create. So you have 2 frames of latency.

Of course you could decrease latency using triple buffering IN game you use, to produce too much frames and drop unnecessery. But just imagine if you could do that:

- have aplication in Wayland with focus. Focused aplication gets imposed fps limit just under maximum refresh rate (let's say 141fps on 144hz monitor) unfocused can go even beyond to lower limit (like 60fps),
- compositor synchronize everything up to that aplication. When aplication has ready frame it takes it and composes everything else (potentially using older frames as they do not matter),
- we send frame straight away using Variable refresh rate.

Aplication in focus has extremly low latency (only minimal due to composing) and everything is tear free. You don't produce any excess frames.

**Quackdoc** · 28 November 2022, 01:58 PM

Originally posted by piotrj3 View Post

This is exactly how older Directx stuff worked before flip presentation model happened. Before flip presentation model, you had settings like "maximum pre rendered frames" and by default it was set to 3. Games in direct9 couldn't drop frames. You could decrease it but if you had CPU outrunning GPU, you could queue up to 3 frames ahead. Vsync simply synchronized ready frame to display and made occasionally wait if frames weren't ready, but in directx before flip presentation model YOU ALWAYS had A -> B -> C. And that is because in old times framerates weren't that high. In moment Crysis 1 or AC1 was out, reviewers were saying 50 fps is amazing result. But what is important to know in directx case is if that GPU wasn't fast enough, then queue wasn't latency of 3 frames, as gpu was rendering slower then refresh rate 99% of time. Basicly it means first frame ready to send was sent and maximum 3 frames could be stored, but most of time they weren't as gpu was slower then refresh rate.

Windows Vista (and everything onwards) introduced Flip presentation model and that one allows dropping frames and displaying always latest finished. But by default this is how DWM works - it always displays latest finished frame so we have no tearing. But what DWM has (that wayland doesn't) is ability to automatically turn off those stuff whenever:
- borderless full screen aplication is on,
- full screen aplication is on,
At that point you have pretty much no composing so aplication and gpu driver are only ones in control of what happens.

Interesting, I'm not familiar with dx9, afaik OGL developers have been doing this for decades now, so I had assumed directx would have too. at the very least I do remember some really ancient games now that seemed to work like this.

though we do have something similar when we use drm leasing, but no one really wants to implement that since it really isn't that great of a solution.

And 2nd thing DWM has that Wayland doesn't is that DWM doesn't wait for slowest on desktop, so if your blender renders and lags, your desktop doesn't. It is being fixed and there is pull request for it, but it has some issues and waits for related work to be upstreamed to change that.

Oh I know about this trust me. are you refering to a wlroots PR? if so I am aware of that. sometimes MPV will bug up and whenever its open will cause my entire desktop to render like a snail until I minimize it. (or change the vo) xD.

Also things like Sway and most Linux stuff, doesn't work by Vsync. Or rather they do, but just before interval to send frame, (max_render_time on sway) before sending frame, it starts own work, composes everything and after compose finish frame is prepared to send to display. This means composing overhead isn't big, most composers and DWM included does that automatically. So there isn't triple buffering on wayland it is double buffering and 2nd buffer is made just in time it is supposed to be sent to frontbuffer (or rather swapped).

This is good part. It isn't working by triple buffering, it just prepares frame when it is needed and doesn't introduce tearing in that and doesn't create too much work and latency is minimal.

also aware of this, however when people talk about wayland forcing vsync, I assume that they talk about running the games. maybe I missed something though.

So where is real problem. Aplication running on wayland (most of time) isn't aware of refresh time. So it cannot time create own frame just in time to also minimalize its latency so we could have a frame almost ready, but doesn't manage to be ready in Wayland-time. So wayland takes previous frame that is old and that old frame took time to create. So you have 2 frames of latency.

Im not sure about this one. I haven't personally tested it, but I know multiple people who have and the results are typically that across KDE and sway, (not tested gnome) that it roughly leads to about one frame of extra latency when using wayland over using X. when allowed to tear.

Of course you could decrease latency using triple buffering IN game you use, to produce too much frames and drop unnecessery. But just imagine if you could do that:

- have aplication in Wayland with focus. Focused aplication gets imposed fps limit just under maximum refresh rate (let's say 141fps on 144hz monitor) unfocused can go even beyond to lower limit (like 60fps),
- compositor synchronize everything up to that aplication. When aplication has ready frame it takes it and composes everything else (potentially using older frames as they do not matter),
- we send frame straight away using Variable refresh rate.

Aplication in focus has extremly low latency (only minimal due to composing) and everything is tear free. You don't produce any excess frames.

the issue with variable refresh rate is that you are still limited to the max frame pacing the display is capable of, while yes it's good, it only works when it's able to. this means if you have one app working at 141fps and another working at say 60fps the inconsistency that is caused by the desync of the two apps will cause wayland to try and flip at extremely small intervals. your suggestion is to use older frames to mitigate this, and while it's nice in theory, that just means you have one application with extremely weird frame pacing, and one without. which would swap between when you click on different things or alt tab.

I do think this has a place, don't get me wrong, but I think if we add some more to this instead of working via focus. we could instead tell the compositor to prioritize specific content types (make this odd protocol actually useful). then the user could choose what to prefer. say you are watching a video and afk'ing in a game, the video is 24fps and the game is whatever. assuming you have a 60hz display (60hz because 24fps is a fractional scale to 120fps and 144fps). you could tell the compositor could choose to focus on the movie meaning it would render at 48fps, and you could then limit the game you are playing to a 48fps. this could be immensely useful.

and vice versa, when you are running a game, you could tell the compositor to instead focus on game types you could do what you say, and do frame limiting and syncing based on the game, you could then add further ways of narrowing it down, like via focus. but I think using content type would be a better idea.

**doomie** · 30 November 2022, 07:07 PM

Originally posted by xfcemint View Post

This is exactly the reason why Wayland should provide to applications the information on expected time of the "vsync" event. Then all the applications can synchronize frame output, not only for the correct rate, but also for the optimal moment of rendering completition.

And this is sounding a lot closer to the article I linked that talked about heartbeat stutter, etc. After decades, truly consistent/intended/sane animation against hardware + software variables still hasn't really been dealt with. The Wayland approach definitely seems like another in the long line of tunnel-visioned solutions (tech in general, not linux) that gave us the situation we have. Strange given the broad money and attention to real-time animation and displays. No, this comment isn't intended to be helpful

Announcement

Wayland Protocols 1.30 Introduces New Protocol To Allow Screen Tearing

Comment

Comment

Comment

Comment

Comment

Comment

Comment