Announcement

Collapse
No announcement yet.

Wayland Protocols 1.30 Introduces New Protocol To Allow Screen Tearing

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by Quackdoc View Post

    Vsync is a broad term, the vsync that wayland forces is triple buffer vsync specifically, which should only ever be a single frame of latency, not sure what the video is specifically talking about, but they are either talking about the typical vsync most games implement, the tested applications were implemented poorly, or the testing methedology is wrong
    Triple buffer vsync is what old school games typically implement and its not just a single frame of latency. In fact if you look at one of the main justifications for Wayland adjusting their protocol to allow for screen tearing, its due to input latency from using touch based devices, directly quoting the article

    For some use cases like games or drawing tablets it can make sense to reduce latency by accepting tearing with the use of asynchronous page flips. This global is a factory interface, allowing clients to inform which type of presentation the content of their surfaces is suitable for.

    Comment


    • Originally posted by mdedetrich View Post
      Triple buffer vsync is what old school games typically implement and its not just a single frame of latency. In fact if you look at one of the main justifications for Wayland adjusting their protocol to allow for screen tearing, its due to input latency from using touch based devices, directly quoting the article
      I'm not going to agree with you completely on this one. The amount of latency that tripple buffering introduces depends on how well tripple buffering is implemented. The main problem is that it commonly doesn't get implemented good enough.

      Another problem is that you also have to consider the 3D graphics pipeline, which can introduce additional latency.

      In an idealy implemeted tripple buffering, the latency is always less than: frame rendering time + display device frame interval:
      maxLatency < renderTime + frameInterval
      ... where the renderTime is measured from the moment when all input data (i.e. keyboard, mouse) has been produced (i.e. mouse movement, keypress) (this does not take into account the display framebuffer to LCD panel latency)

      Comment


      • Originally posted by xfcemint View Post

        I'm not going to agree with you completely on this one. The amount of latency that tripple buffering introduces depends on how well tripple buffering is implemented. The main problem is that it commonly doesn't get implemented good enough.

        Another problem is that you also have to consider the 3D graphics pipeline, which can introduce additional latency.

        In an idealy implemeted tripple buffering, the latency is always less than: frame rendering time + display device frame interval:
        maxLatency < renderTime + frameInterval
        ... where the renderTime is measured from the moment when all input data (i.e. keyboard, mouse) has been produced (i.e. mouse movement, keypress) (this does not take into account the display framebuffer to LCD panel latency)
        No it is not always less.

        It is more in cases when have less frames per second comparing to refresh rate. In that case it goes miserably long latency is triple buffering is way to offset jumping (eg from 30fps to 60fps all time when refresh rate is 60hz). In that case you have full triple buffer latency that is inconsistent (so base latency + 2 frames).

        People forget triple buffering does NOT imply how stuff works under the hood.

        We have 60 hz monitor with triple buffering (so 2 back buffers) and overkill setup (way over 60fps) .

        Frame A is in front.
        Frame B and C are already in backbuffers.

        Does in that time, we work on frame D. That is not defined.

        If we work on frame D, it means we will discard frame B, what meant some work we did will be useless heater that never gets displayed. (this is fast sync/enhanced sync in Nvidia/AMD). You don't do that on compositors (Wayland hello!)

        If we wait until frontbuffer swaps to B, to make space for frame D, it means we have full latency of 3 buffers. That is not good either.

        If we end up fps <= refresh rate (or oscilating FPS around refresh rate), that is made more to prevent situations, frontbuffer is displaying, 1 backbuffer is ready, we cannot work on another backbuffer because buffers but loaded, so gpu doesn't work, while next frame has some microstutter that won't be ready in 16.6ms, meaning it will take 33.3ms window to display another frame and we get a sudden drop from 60fps to 30fps. Additional backbuffer improves better utilization of GPU,
        Last edited by piotrj3; 28 November 2022, 07:47 AM.

        Comment


        • Originally posted by piotrj3 View Post
          No it is not always less.

          ... snipped ...

          We have 60 hz monitor with triple buffering (so 2 back buffers) and overkill setup (way over 60fps) .

          Frame A is in front.
          Frame B and C are already in backbuffers.

          Does in that time, we work on frame D. That is not defined.

          If we work on frame D, it means we will discard frame B, what meant some work we did will be useless heater that never gets displayed. (this is fast sync/enhanced sync in Nvidia/AMD). You don't do that on compositors (Wayland hello!)

          ... snipped ...
          I do not agree.

          We work of frame D, which means that the frame B gets discarded. It means that processing time has been wasted (and heat has been generated), but that is irrelevant because the goal is to minimize latency, not to find a balance between latency and heat.

          Perhaps the user can be given an option in the configuration, which would enable some kind of "triple buffering predictor" to just skip (or defer) rendering the frame B. The consequence of enabling such an option would be a small increase in average latency (and an increase in maximum latency), as a consequence of the predictor not being perfect (i.e occasional wrong prediction).

          Freesync also helps in this situation.

          Edit: Also, I think that if the predictor is a very good predictor, it can actually decrease the average latency (by assuring that rendering is completed just before the moment when the front framebuffer needs to be switched).
          Last edited by xfcemint; 28 November 2022, 08:50 AM.

          Comment


          • Originally posted by piotrj3 View Post
            Frame A is in front.
            Frame B and C are already in backbuffers.

            Does in that time, we work on frame D. That is not defined.

            If we work on frame D, it means we will discard frame B, what meant some work we did will be useless heater that never gets displayed. (this is fast sync/enhanced sync in Nvidia/AMD). You don't do that on compositors (Wayland hello!)
            this is exactly how turning vsync off on wayland works. we don't care about wasted work, since the exact same work would have been done without vsync anyways. this is also how nearly all modern triple buffer vsync works.

            If we wait until frontbuffer swaps to B, to make space for frame D, it means we have full latency of 3 buffers. That is not good either.
            no idea if anything actually works like this, because this is stupid and it holds little to no benefit, but sounds like it would be detrimental


            Comment


            • Originally posted by Quackdoc View Post
              this is exactly how turning vsync off on wayland works. we don't care about wasted work, since the exact same work would have been done without vsync anyways. this is also how nearly all modern triple buffer vsync works.

              no idea if anything actually works like this, because this is stupid and it holds little to no benefit, but sounds like it would be detrimental
              I have a question. (Sorry for interruption)

              If you consider my post just above, you can see that I'm mentioning a "tripple buffering predictor".

              Is anyone working on such a predictor in Wayland? I.e. is there any chance that such functionality could be added soon?

              From my analysis it follows that a good predictor could decrease the average latency (while having some minimal disadvantages). Also, an option (slider) could be added to the predictor to adjust the "reserved time" difference between the moment when a frame has been rendered and the moment when the front framebuffer needs to be swapped. Increasing the "reserved time" decreases the chances of a wrong prediction by the predictor.

              Comment


              • Originally posted by xfcemint View Post
                I have a question. (Sorry for interruption)

                If you consider my post just above, you can see that I'm mentioning a "tripple buffering predictor".

                Is anyone working on such a predictor in Wayland? I.e. is there any chance that such functionality could be added soon?

                From my analysis it follows that a good predictor could decrease the average latency (while having some minimal disadvantages). Also, an option (slider) could be added to the predictor to adjust the "reserved time" difference between the moment when a frame has been rendered and the moment when the front framebuffer needs to be swapped. Increasing the "reserved time" decreases the chances of a wrong prediction by the predictor.
                I am entirely unsure, it's possible but not something I would personally be interested in, so it would be something I would have skimmed over.

                Comment


                • Originally posted by Quackdoc View Post

                  this is exactly how turning vsync off on wayland works. we don't care about wasted work, since the exact same work would have been done without vsync anyways. this is also how nearly all modern triple buffer vsync works.



                  no idea if anything actually works like this, because this is stupid and it holds little to no benefit, but sounds like it would be detrimental

                  This is exactly how older Directx stuff worked before flip presentation model happened. Before flip presentation model, you had settings like "maximum pre rendered frames" and by default it was set to 3. Games in direct9 couldn't drop frames. You could decrease it but if you had CPU outrunning GPU, you could queue up to 3 frames ahead. Vsync simply synchronized ready frame to display and made occasionally wait if frames weren't ready, but in directx before flip presentation model YOU ALWAYS had A -> B -> C. And that is because in old times framerates weren't that high. In moment Crysis 1 or AC1 was out, reviewers were saying 50 fps is amazing result. But what is important to know in directx case is if that GPU wasn't fast enough, then queue wasn't latency of 3 frames, as gpu was rendering slower then refresh rate 99% of time. Basicly it means first frame ready to send was sent and maximum 3 frames could be stored, but most of time they weren't as gpu was slower then refresh rate.

                  Windows Vista (and everything onwards) introduced Flip presentation model and that one allows dropping frames and displaying always latest finished. But by default this is how DWM works - it always displays latest finished frame so we have no tearing. But what DWM has (that wayland doesn't) is ability to automatically turn off those stuff whenever:
                  - borderless full screen aplication is on,
                  - full screen aplication is on,
                  At that point you have pretty much no composing so aplication and gpu driver are only ones in control of what happens.

                  And 2nd thing DWM has that Wayland doesn't is that DWM doesn't wait for slowest on desktop, so if your blender renders and lags, your desktop doesn't. It is being fixed and there is pull request for it, but it has some issues and waits for related work to be upstreamed to change that.

                  Also things like Sway and most Linux stuff, doesn't work by Vsync. Or rather they do, but just before interval to send frame, (max_render_time on sway​) before sending frame, it starts own work, composes everything and after compose finish frame is prepared to send to display. This means composing overhead isn't big, most composers and DWM included does that automatically. So there isn't triple buffering on wayland it is double buffering and 2nd buffer is made just in time it is supposed to be sent to frontbuffer (or rather swapped).

                  This is good part. It isn't working by triple buffering, it just prepares frame when it is needed and doesn't introduce tearing in that and doesn't create too much work and latency is minimal.

                  So where is real problem. Aplication running on wayland (most of time) isn't aware of refresh time. So it cannot time create own frame just in time to also minimalize its latency so we could have a frame almost ready, but doesn't manage to be ready in Wayland-time. So wayland takes previous frame that is old and that old frame took time to create. So you have 2 frames of latency.

                  Of course you could decrease latency using triple buffering IN game you use, to produce too much frames and drop unnecessery. But just imagine if you could do that:

                  - have aplication in Wayland with focus. Focused aplication gets imposed fps limit just under maximum refresh rate (let's say 141fps on 144hz monitor) unfocused can go even beyond to lower limit (like 60fps),
                  - compositor synchronize everything up to that aplication. When aplication has ready frame it takes it and composes everything else (potentially using older frames as they do not matter),
                  - we send frame straight away using Variable refresh rate.

                  Aplication in focus has extremly low latency (only minimal due to composing) and everything is tear free. You don't produce any excess frames.
                  Last edited by piotrj3; 28 November 2022, 12:43 PM.

                  Comment


                  • Originally posted by piotrj3 View Post
                    So where is real problem. Aplication running on wayland (most of time) isn't aware of refresh time. So it cannot time create own frame just in time to also minimalize its latency so we could have a frame almost ready, but doesn't manage to be ready in Wayland-time. So wayland takes previous frame that is old and that old frame took time to create. So you have 2 frames of latency.
                    If piotrj is correct, than I can agree that the current Wayland isn't as good as it would be expected for a 2022 tech. To minimize latency, it is very desirable that applications are aware of the (expected) points in time when the front framebuffer must be flipped. Such a tech was widely available in 1982, so why not in 2022 (i.e computers and consoles of 1980's were providing information on vsync time to application(s), therefore they had less latency than today's computers).
                    Last edited by xfcemint; 28 November 2022, 01:37 PM.

                    Comment


                    • Originally posted by mdedetrich View Post
                      Triple buffer vsync is what old school games typically implement and its not just a single frame of latency. In fact if you look at one of the main justifications for Wayland adjusting their protocol to allow for screen tearing, its due to input latency from using touch based devices, directly quoting the article
                      I figured out what is the problem here.

                      The problem is that you are ambiguous.

                      Tripple buffering adds (at worst) one additional frame of latency, but compared to what? Compared to vsync disabled! That's what the people here are talking about.

                      However, you seem to be comparing tripple buffering latency to something other - but what is this "other"? Can it even be meaningfully defined?

                      Comment

                      Working...
                      X