Announcement

Collapse
No announcement yet.

Mesa Adds "Block On Depleted Buffers" Option To Reduce Latency

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mesa Adds "Block On Depleted Buffers" Option To Reduce Latency

    Phoronix: Mesa Adds "Block On Depleted Buffers" Option To Reduce Latency

    After the idea has been discussed for about a year, Mesa 22.3 has landed a new performance option called "block_on_depleted_buffers" to wait on buffers at the end of a swap to reduce latency -- a possible one frame advantage...

    https://www.phoronix.com/news/Mesa-B...pleted-Buffers

  • #2
    Applies cleanly over 22.2.2

    Comment


    • #3
      Is this like the Ultra low-latency mode in the NVIDIA Control Panel on Windows (NULL)? There's no equivalent on Linux to my knowledge.

      Comment


      • #4
        For those who just skimmed through the print, in the patched version, the call to loader_dri3_get_buffers happens after the "purple" bar instead of waiting like the version without the patch

        Comment


        • #5
          Originally posted by Calinou View Post
          Is this like the Ultra low-latency mode in the NVIDIA Control Panel on Windows (NULL)? There's no equivalent on Linux to my knowledge.
          That toggle/Nvidia reflex do indeed "throttle" the CPU part of the rendering so that it doesn't fill the buffers with more frames than is necessary.

          Comment


          • #6
            Originally posted by brucethemoose View Post

            That toggle/Nvidia reflex do indeed "throttle" the CPU part of the rendering so that it doesn't fill the buffers with more frames than is necessary.
            So it's basically the same as DXVK's
            Code:
            dxgi.maxFrameLatency = 1
            ?

            Comment


            • #7
              Originally posted by Calinou View Post
              Is this like the Ultra low-latency mode in the NVIDIA Control Panel on Windows (NULL)? There's no equivalent on Linux to my knowledge.
              There is LatencyFlex: https://github.com/ishitatsuyuki/LatencyFleX

              Comment


              • #8
                For this to work, the app must request the new draw buffer before sampling input and updating the scene so that this can block input. So this should only be enabled on an app-by-app basis, because requesting the draw buffer before sampling input is unintuitive. No developer will have engineered it that way unless they have this feature in mind.

                So this should only be enabled on an app by app basis. I would bet that 99.9% of all games would either see no benefit, or regress, as things stand today. But it's great that it will help in years to come.
                Last edited by linuxgeex; 25 October 2022, 04:12 PM.

                Comment


                • #9
                  Is this the reason why SwapBuffers (e.g. used by SDL_RenderPresent()) not block immediately? I remember mitigating some of the latency by inserting an SDL_RenderClear() which would block until next VBlank interval...

                  Also, is this part of the catch-up puzzle?

                  Originally posted by tildearrow View Post
                  ​​​​​​The draw procedure of an application goes like this:

                  1. clear
                  2. draw
                  3. present (AKA swap buffers)

                  If VSync is on, in a proper driver, present must wait until the vertical blank has been reached, and then swap buffers.

                  Code:
                  A display frame (in this example, rate 50Hz (period 20ms)):
                  Every line indicates a millisecond
                  L GPU CPU
                  @ *Pre-VBlank* Update Logic (CPU) (e.g. 4ms)
                  | .
                  | .
                  | *Visible Raster* .
                  | Clear (e.g. 1ms)
                  | . Draw (e.g. 6ms)
                  | . .
                  | . .
                  | . .
                  | . .
                  | . .
                  | SwapBuffers is requested here
                  | |
                  | |
                  | | The driver
                  | | waits for
                  | | VBlank
                  | *Post-VBlank* <- SwapBuffers happens here
                  | (rest of actions)
                  | .
                  However, if we spend too much time in a frame, it will halve our FPS:

                  Code:
                  L GPU CPU
                  @ *Pre-VBlank* Update Logic (CPU) (e.g. 4ms)
                  | .
                  | .
                  | *Visible Raster* .
                  | Clear (e.g. 1ms)
                  | . Draw (e.g. 23ms)
                  | . .
                  | . .
                  | . .
                  | . .
                  | . .
                  | . .
                  | . .
                  | . .
                  | . .
                  | . .
                  | . .
                  | *Post-VBlank* .
                  | . .
                  | . .
                  @ *Pre-VBlank* .
                  | . .
                  | . .
                  | *Visible Raster* .
                  | . .
                  | . .
                  | . .
                  | . .
                  | | SwapBuffers is requested here
                  | | Driver waits
                  | | for VBlank
                  | |
                  | |
                  | |
                  | |
                  | |
                  | |
                  | *Post-VBlank* <- SwapBuffers happens here
                  | (rest of actions)
                  | .
                  In order to fix this "problem", NVIDIA invented Adaptive-VSync, and then this method was ported to Mesa.

                  This way we no longer wait for VSync when swapping buffers, but rather put the frame in a "queue" that will be popped on the next VBlank.
                  The wait only happens if you send another command (e.g. clear).

                  Code:
                  L GPU CPU
                  @ *Pre-VBlank* Update Logic (CPU) (e.g. 4ms)
                  | .
                  | .
                  | *Visible Raster* .
                  | Clear (e.g. 1ms)
                  | . Draw (e.g. 6ms)
                  | . .
                  | . .
                  | . .
                  | . .
                  | . .
                  | SwapBuffers happens right here
                  | Frame remains Update Logic (CPU)
                  | in a queue .
                  | and returns .
                  | immediately .
                  | Clear
                  | *Post-VBlank* Draw
                  | Popped from .
                  | queue .
                  However, this is undesirable for applications like compositors and UI toolkits with animation, as it will prevent us from knowing when do we exactly hit VBlank.

                  (If you're wondering why do we want to know when does VBlank happen, it's for lag reduction techniques)

                  Furthermore in Mesa's implementation the catch-up issue appears.

                  Code:
                  Display | Machine
                  frame 0 | frame 0
                  frame 1 | frame 1
                  frame 2 | frame 2
                  frame 3 | say we lag here for ~250ms
                  frame 4 | frame 2
                  frame 5 | .
                  frame 6 | .
                  frame 7 | .
                  frame 8 | .
                  frame 9 | .
                  frame 10 | .
                  frame 11 | .
                  frame 12 | .
                  frame 13 | .
                  frame 14 | .
                  frame 15 | .
                  frame 16 | .
                  frame 17 | .
                  frame 18 | frame 2 frame 3 frame 4 frame \
                  frame 19 | 5 frame 6 frame 7 frame 8 fra |
                  frame 20 | me 9 frame 10 frame 11 frame |
                  frame 21 | 12 frame 13 frame 14 frame 15 | Catch-up
                  frame 22 | frame 16 frame 17 frame 18 fr |
                  frame 23 | ame 19 frame 20 frame 21 fram |
                  frame 24 | e 22 frame 23 frame 24 /
                  frame 25 | frame 25
                  frame 26 | frame 26
                  frame 27 | frame 27
                  frame 28 | frame 28
                  frame 29 | frame 29
                  In other words, say I make an application that intentionally waits 500ms after rendering 30 frames of a spinning triangle.

                  Under a proper driver, this would happen:

                  (60Hz display)

                  1. Triangle rotates for 0.5s
                  2. Triangle stops rotating and application waits 0.5s
                  3. Triangle continues rotating for 0.5s
                  4. Triangle stops rotating and application waits 0.5s
                  5. Loop from 1

                  However, under Mesa, this happens:

                  1. Triangle rotates for 0.5s
                  2. Triangle stops rotating and application waits 0.5s
                  3. Triangle suddenly skips to a different rotation
                  4. Application waits 0.5s
                  5. Triangle suddenly skips to a different rotation
                  6. Application waits 0.5s
                  7. Loop from 3

                  (sorry for the crude explanation... I'll post a GIF soon)

                  How can I stop this behavior and go back to the good old true waiting VSync?

                  Comment


                  • #10
                    Originally posted by Calinou View Post
                    Is this like the Ultra low-latency mode in the NVIDIA Control Panel on Windows (NULL)? There's no equivalent on Linux to my knowledge.
                    Vsync stuff is about frames already rendered on the GPU, that Nvidia setting (or maxFrameLatency in DXVK) is about CPU prerender.

                    I'm not overly enthusiastic for this to be guarded by DRI config. If I use vsync, I expect it to deliver the smoothest result possible (without VRR) while keeping latency as low as possible for this. I suppose this is the case on Windows D3D when game devs implement vsync properly.

                    Comment

                    Working...
                    X