Announcement

Collapse
No announcement yet.

Ubuntu 23.04 & Debian Prepare For Updated GNOME Triple Buffering Optimization

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    ignoring the hacky clockspeed workaround for a sec, as someone who demands vsync in games/compositors/videos, i dont see how any compositor can exist without triple buffering (dynamic or otherwise)

    with vsync, if the rendering load is too demanding to complete a frame within refresh, it wont appear until the next refresh cycle, but the problem with double buffering is that the next potential frame WAITS to render since there's no available buffer, resulting in a halving of the framerate if the demanding load is prolonged

    triple buffering at least lets the next frame start rendering, so for example a steady 20ms load on a 60hz screen can be an average of 50fps rather than being locked to 30fps

    yes it's merely bouncing 33 16 16 33 16 33 16 16 33 16 33 ms, but that's perceptually less jarring than even half a second of sudden 30fps such as during a visual effect or animation

    the other alternative is not syncing in such a situation, but tearing is also jarring especially if the framerate is near the refresh rate

    "but the desktop is not a demanding game", it still needs to run on phones or ultra low power tinker hardware, what about games displayed in a window, what about demanding websites, what about content creation, what about a file manager displaying thousands of items, etc

    Originally posted by stompcrash View Post
    So the input latency would be as bad as if the FPS were 20. That seems bad, but maybe it isn't noticeable in practice.

    I wonder, what do games do, if they want to minimize stuttering, and also minimize latency? Let's assume a game which doesn't tax the GPU beyond its capabilities, which I am guessing the desktop also doesn't do. Also, can we assume freesync is enabled? How does that impact the stuttering? I assume it smooths it over a bit since without, you jump from 60FPS to 30FPS, but with it you can still get the frame faster.

    I fondly remember my Amiga 500. It didn't have stutter. Not like this. I don't think this is a fundamental issue that can't be solved correctly, but maybe the GPU developers have architected their hardware in a way that makes it difficult for them to solve.
    hardware cursors do not have extra latency nor are they tied to the framerate (imagine a separate transparent layer dedicated to displaying the cursor), so it actually doesnt matter how slow the game or interface is as long moving the mouse is snappy, only a few wrongly designed games or desktop environments still use software cursors

    end-to-end latency is a lot more complex than the render time (mouse to pc, pc to game engine, game engine tick rate, game engine to gpu, gpu to display controllers, display controllers to crystals/leds changing), most everyone has dozens of ms latency at minimum

    in the last few years, nvidia and amd gpu (windows) drivers have worked on reducing this total latency, sometimes with game dev or even monitor hardware involvement, but i never familiarized myself with the low level details of how that was done (almost always per game, imagine from 70ms to 40ms total latency at similar already high average framerate)

    i've never used an amiga, but in those days either vsync was not common OR everything was tied to processor clockspeeds and scanouts (vsync without the safety of duplicating frames?), so devs focused on consistent framerate and not exceeding frame time budgets

    but freesync and gsync definitely solve the problem of displaying when ready without additional latency



    EDIT: fun observation of (composited, aero enabled) windows: when click holding to drag a window, you can visibly see the cursor shift its position backwards and then letting go of the click shifts the cursor forwards, which shows that the composited desktop is lagging behind by a frame compared to the cursor, so they actually put in the effort to hide this visual discrepancy while (composited, vsynced) xfce by comparison has the cursor shifting around relative to the window being dragged (i havent checked the latest windows versions to see if this is still the case, and i havent squinted at gnome/kde/weston/wayfire to look for this effect, plus i'm inclined to force vsync globally in mesa then turn off compositor vsync so i'd need to confirm if xfce still acts the same)
    Last edited by kn00tcn; 16 February 2023, 07:12 PM.

    Comment


    • #32
      Originally posted by anarsoul View Post

      It could, however it requires developing a new extension for GL or Vulkan.
      It could also be e.g. a KMS ioctl.

      Moreover it would require introducing permissions for the apps the use GPU. You don't want arbitrary app to be able to ask for a max GPU frequency.
      A KMS ioctl limited to DRM master would only be usable by display servers.

      Originally posted by kn00tcn View Post
      yes it's merely bouncing 33 16 16 33 16 33 16 16 33 16 33 ms, but that's perceptually less jarring than even half a second of sudden 30fps such as during a visual effect or animation
      I'd take consistent 30 fps over over that kind of flip-flop any day. The latter results in horrible judder.

      "but the desktop is not a demanding game", it still needs to run on phones or ultra low power tinker hardware, what about games displayed in a window, what about demanding websites, what about content creation, what about a file manager displaying thousands of items, etc
      With mutter!1880 which landed for mutter 44, mutter's frame rate can be independent from client frame rates. Mutter can sustain full frame rate even while clients are GPU-limted to lower frame rates (subject to GPU HW/driver limitations).

      hardware cursors do not have extra latency nor are they tied to the framerate
      Actually they currently are with the atomic KMS API.

      i've never used an amiga, but in those days either vsync was not common
      It wasn't just common, it was the norm. Amigas have custom ASICs which make double-buffering with VSync very easy.

      OR everything was tied to processor clockspeeds and scanouts
      That too, PAL vs NTSC versions of Amigas have slightly different CPU clock speeds, because it's derived from and synchronized to the video mode pixel clock.

      Comment


      • #33
        Originally posted by anarsoul View Post
        Moreover it would require introducing permissions for the apps the use GPU. You don't want arbitrary app to be able to ask for a max GPU frequency.
        Frankly, why not? Forcing the max GPU frequency can't escalate privileges, can't access private data, and can't damage the system. All it can do is increase the power consumption a little bit. Make it work like /dev/cpu_dma_latency, where the request is active as long as you're holding an open file descriptor, so lsof can tell you who's latching the frequency.

        This is the, "nice < default is a privileged operation" nonsense all over again.

        Comment


        • #34
          Originally posted by yump View Post

          Frankly, why not? Forcing the max GPU frequency can't escalate privileges, can't access private data, and can't damage the system. All it can do is increase the power consumption a little bit. Make it work like /dev/cpu_dma_latency, where the request is active as long as you're holding an open file descriptor, so lsof can tell you who's latching the frequency.

          This is the, "nice < default is a privileged operation" nonsense all over again.
          It's not a little bit. Discrete GPU can consume tens of watts and drain your battery pretty quickly.

          Comment


          • #35
            Originally posted by anarsoul View Post

            It's not a little bit. Discrete GPU can consume tens of watts and drain your battery pretty quickly.
            To consume tens of watts, GPU must run at both high clocks and high utilization. High clocks will happen automatically, after a delay, just by running plain old code on the dGPU (or the CPU), which is not a privileged operation. In fact it is so not a privileged operation that any Tom, Dick, or Harry webdev on the internet is allowed to do it in your browser. Even to the point of degrading the performance of the whole machine by blowing out the L3 cache or heating up the CPU enough to throttle with SIMD instructions.

            What we are talking about here is running a light workload at high clocks and low utilization, which we should expect to increase the power consumption by about the square of the frequency ratio.

            Comment


            • #36
              Originally posted by MrCooper View Post
              I'd take consistent 30 fps over over that kind of flip-flop any day. The latter results in horrible judder.
              have you extensively tested or experienced this (in games)? on 60hz, if the average with very mild judder is 50-59fps, it's a cleaner image than tearing and certainly not a disorienting (mouse) input experience like double buffering

              however when consistently below around 40fps average, i would usually lock to 30 yes, or if a game is wildly fluctuating at every turn like skate2 on ps3 (which oddly enough had pc-like options for vsync on/off and capping to 30/60)

              a locked 30fps desktop or say a photo/video editing interface is quite disappointing, especially when it's 60+ most of the time, a vsync judder (or optional tear) at least allows the increased load to gradually affect smoothness or input as this situation usually happens during gradual load increase (like one too many windows animating an overview gesture than usual, a dozen more thumbnail icons selected than usual)

              i can understand wanting to avoid judder, 24 footage in a 30 video is truly awful, but the effect is much less pronounced at 60+ especially if it's a mild dip in average fps rather than being a huge dip closer to half refresh, this is supposed to be only a temporary dip


              Originally posted by yump View Post
              What we are talking about here is running a light workload at high clocks and low utilization, which we should expect to increase the power consumption by about the square of the frequency ratio.
              high clock states are ALSO tied to high voltage states on modern cpus+gpus, the power consumption is way higher than merely a clock speed difference
              Last edited by kn00tcn; 26 February 2023, 11:21 PM.

              Comment


              • #37
                Originally posted by kn00tcn View Post
                high clock states are ALSO tied to high voltage states on modern cpus+gpus, the power consumption is way higher than merely a clock speed difference
                I know that and accounted for it. That's why it increases the power consumption by the square of the frequency ratio. Without a voltage difference, there'd be no power difference at all.

                Required voltage is approximately proportional to frequency.

                v ~= f

                Dynamic power (most of it, AIUI) is proportional to frequency * voltage^2. That's number of switching events multiplied by the energy stored in the capacitance (doubled because half is dissipated in the resistance of the charging path).

                P ~= fv2

                If the chip's power management adjusts the voltage to the minimum required for a given frequency, that becomes:

                P ~= f3

                But because this is a light workload so the chip can sleep between frames, and the runtime is scaled by 1/f, we lose one factor of frequency:

                E ~= f2



                Comment


                • #38
                  Originally posted by yump View Post
                  Without a voltage difference, there'd be no power difference at all.
                  oh i thought even a clock difference results in a very small power difference, maybe i'll try some experiments

                  Originally posted by yump View Post
                  ...science equations...
                  the way i look at it is not based on proportional requirements, but based on what's been built into the firmware or drivers (aka a coarse list of states with more voltage than necessary to account for chip variance), which in the past also had enough delay that the high power state lingers beyond a frame of zero load, but i guess modern chips have potentially sub-millisecond power gating

                  different architectures handle themselves differently, i'm still on simple desktop polaris and laptop apu vega if i were to try poking at things

                  recently i decided to lock the desktop polaris to its lowest state on linux after realizing the browser idling on a beatport page was measurably bouncing around the top states, measurably going over 1v, and the fan kept spinning up, without any such overly aggressive game-like power states on windows


                  EDIT: my original thought when i first wrote 'modern' cpus/gpus was the inverse, that they have extremely low idle power and voltage where decades past used fixed voltage/clocks regardless of load... it's interesting because i was under the impression that large variances will degrade electronics faster, such as repeatedly cycling between 35 to 75 degrees would be worse than 50 to 75 degrees, modern chips are doing just that along with very large power spikes (well there are multiple degradations at play anyway, current/voltage/heat/etc, loss of bga contact/increased wire or transistor resistance/sudden cracking of materials/etc)
                  Last edited by kn00tcn; 04 March 2023, 10:59 AM.

                  Comment


                  • #39
                    Sorry for the late reply.

                    Originally posted by kn00tcn View Post
                    oh i thought even a clock difference results in a very small power difference, maybe i'll try some experiments
                    I was talking about the scenario where there's a fixed amount of work to be done per time (like video playback) that can be handled at either frequency, but I'm going to retract my claim that it can't make any difference. For same voltage and lower frequency the chip will almost certainly use *more* power, because it can't spend as much time in power-gated idle states. On the other hand I can even think of a positive effect: higher duty ratio at lower frequency would reduce peak hotspot temperature, thereby lowering wire resistance. And I

                    the way i look at it is not based on proportional requirements, but based on what's been built into the firmware or drivers (aka a coarse list of states with more voltage than necessary to account for chip variance), which in the past also had enough delay that the high power state lingers beyond a frame of zero load, but i guess modern chips have potentially sub-millisecond power gating

                    different architectures handle themselves differently, i'm still on simple desktop polaris and laptop apu vega if i were to try poking at things
                    I also have Polaris. I've read that Vega and newer use a parametric curve instead of fixed frequency-voltage points. Should be in the kernel documentation for pp_od_clk_voltage from the amdgpu driver.

                    recently i decided to lock the desktop polaris to its lowest state on linux after realizing the browser idling on a beatport page was measurably bouncing around the top states, measurably going over 1v, and the fan kept spinning up, without any such overly aggressive game-like power states on windows
                    I found the same, and also that most important is keeping the memory clock low. You can also try changing the pp_power_profile_mode, by writing the number in the first column back to the file. CoreCtrl also exposes this.


                    EDIT: my original thought when i first wrote 'modern' cpus/gpus was the inverse, that they have extremely low idle power and voltage where decades past used fixed voltage/clocks regardless of load... it's interesting because i was under the impression that large variances will degrade electronics faster, such as repeatedly cycling between 35 to 75 degrees would be worse than 50 to 75 degrees, modern chips are doing just that along with very large power spikes (well there are multiple degradations at play anyway, current/voltage/heat/etc, loss of bga contact/increased wire or transistor resistance/sudden cracking of materials/etc)
                    I assume the chip designers have sufficiently good modeling that they trust it to be safe until well beyond the warranty period. But it does seem that Intel's server parts have much lower boost clocks than the desktop ones...

                    Comment

                    Working...
                    X