Announcement

Collapse
No announcement yet.

A Low-Latency Kernel For Linux Gaming

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Low latency kernel does not mean good for gaming.
    It does not mean high performance.

    It means events are guaranteed to trigger with low latency, doesn't mean it will be fast, just quick.

    Low latency is great for industrial applications, robotics, physical security systems, medical systems, and such. Also good for audio production. Its not for gaming.

    Comment


    • #12
      This article seems pointless. With maximum fps over 60 in all games, who cares about the exact number? Minimum frame rate or the already mentioned frame jitter seem a much more suitable measure of the effect of an rt kernel to me.

      Comment


      • #13
        Originally posted by del_diablo View Post
        Which is blatantly false. A monitor does not need such a severe input lag as 100ms before it becomes noticable. All you need to do, is that you have a jitter of 1ms --> 10ms --> 1ms --> 10ms, and it should be noticable that the input is quite unsmooth, especially if you are testing a application where the input matters(hardcore quake FPS anybody?).
        I want your hardware that jitters between 100-1000 frames/s...

        Comment


        • #14
          Originally posted by DavidNielsen View Post
          It may be that what people are talking about in terms of improved gaming "performance" is not the framerate (which unsurprisingly does stay unchanged or degrades a bit with preemption enabled - let alone if one was to install a proper -rt kernel). However you might experience a situation where overall latency decreases which might improve responsiveness to input devices, or the overall "smoothness" of the game might feel more right. Call it trading off excess fps for equal low latency access (who really cares if you are pushing 172 fps if your mouse jerky, when it could all be smooth at say 160 fps).
          That's what I thought when I read low latency. I expected better response from controls and network latency. It would have obviously traded off fps and through output to achieve this.

          Comment


          • #15
            Not.sure if it makes sense to benchmark a low-latency system if you don't have a latency benchmark or understand what latency is.

            Comment


            • #16
              Originally posted by del_diablo View Post
              Which is blatantly false. A monitor does not need such a severe input lag as 100ms before it becomes noticable. All you need to do, is that you have a jitter of 1ms --> 10ms --> 1ms --> 10ms, and it should be noticable that the input is quite unsmooth, especially if you are testing a application where the input matters(hardcore quake FPS anybody?).
              Nevermind that once you have gotten the mind into "ready mode", and are into a "flow of actions", the static reaction times is no longer there. Now... the 100ms median is true if you are waiting for twitching. But if you are in a constant twitching movement, in a state where you already have processed all the information, 100ms not your reaction time.
              First of i wasn't referring to games or computers. The numbers come from the book "Introduction to Human Factors and Ergonomics for Engineers". 100ms as reaction time is out of this world. There are quite some things that happen between the time you get stimulated and the time that you will do what you have to do. The human body has its limits.

              Comment


              • #17
                Originally posted by ownagefool View Post
                FYI, if we're seriously talking about 30ms kernel, I'd suspect we're seriously being way too slow already. We're talking about a group of people who raise their USB polling rates from 8ms to 1-2ms, using monitors than need to have < 10ms lag, and playing at high FPS (thus meaning < 10ms delay between screens) with generally 100hz+ refresh rates. A 30ms responce time is already way too slow, though I suppose thats a maxiumum as opposed to an average.
                I think you're mixing things up. Where we're too slow actually? Linux kernel is much more responsive than Windows, but if this matters to things you described I'm not so sure. When I had stuttering in Skyrim under Windows after 1.5 patch I had to limit FPS and game become playable once again, so there are more important things that affect smoothness than kernel responsiveness. I bet your responsiveness will be messed up with real time kernel and games will simply be much worst. Just a bet, but I played a lot with custom kernels in the past and generic was usually the best when comes to gaming.

                Comment


                • #18
                  I agree with most of the previous posters, the benchmark only measures one aspect of many - of which some are much more important than "pure" fps.
                  - input latency
                  - input latency jitter
                  - fps jitter
                  are the three factors that would be most interresting.

                  Comment


                  • #19
                    For everyone complaining about <1ms kernel lag, consider that Xorg with its defaults can cause lag up to 30ms:


                    The only way to avoid that in the current situation is to run a dedicated X server for your game. Then it can't ignore your client, because it's the only one there.

                    Comment


                    • #20
                      Low Latency

                      I have a little scheduling latency tester (I do realtime audio, so it's something of interest).

                      Basically it runs (zero or N*2) looping background threads (niced) and then 1 or N (high priority/realtime) threads that attempt to be scheduled using nanosleep to wake up at predefined times.

                      Take the numbers with a grain of salt, I've only used it for internal stuff. The testing could be better too, using locked memory and forcing swapping during execution.

                      I'm aware I'm not testing response to IO events here, but it does give some interesting numbers:

                      These figures are for kernels with wakeup frequency around 2756 hz (I test lots more frequencies, but around here is where the CPU shows up some issues for the laptop it's running on):

                      Quick explanation of the figures:

                      its = iterations
                      freq/frq = wakeup frequency
                      perNan = Nano delta between wakeups
                      perMic = Microseconds delta between wakeups
                      perMil = Milliseconds delta between wakeups
                      nhi = number high priority (SCHED_RR)
                      nlo = number regular priority (normal linux)
                      bbac = number background looping threads
                      fails = number of iterations that failed to meet the wakeup deadline
                      avr = average "oversleep"
                      mnan = maximum "oversleep" in nanoseconds
                      mmic = maximum "oversleep" in microseconds
                      mmil = maximum "oversleep" in milliseconds

                      Code:
                      2.6.39-preempt-gentoo-r3
                      Frequency pass: freq(2756) perNan(362844.70) perMic(  362.84) perMil( 0.36)
                      One pass: its(2000) frq( 2756.00) nhi(00) nlo(01) bbac(00) fails(    1) avr(   78600.29) mnan(   383190) mmic(  383.19) mmil( 0.38)
                      One pass: its(2000) frq( 2756.00) nhi(00) nlo(01) bbac(08) fails(    0) avr(   50682.76) mnan(   228595) mmic(  228.59) mmil( 0.23)
                      One pass: its(2000) frq( 2756.00) nhi(00) nlo(02) bbac(00) fails(   13) avr(   73451.72) mnan(   978501) mmic(  978.50) mmil( 0.98)
                      One pass: its(2000) frq( 2756.00) nhi(00) nlo(02) bbac(08) fails(    8) avr(   54678.49) mnan(  2271642) mmic( 2271.64) mmil( 2.27)
                      One pass: its(2000) frq( 2756.00) nhi(01) nlo(00) bbac(00) fails(    5) avr(   36309.15) mnan(   839371) mmic(  839.37) mmil( 0.84)
                      One pass: its(2000) frq( 2756.00) nhi(01) nlo(00) bbac(08) fails(    1) avr(    4686.12) mnan(  1854080) mmic( 1854.08) mmil( 1.85)
                      One pass: its(2000) frq( 2756.00) nhi(02) nlo(00) bbac(00) fails(    0) avr(   22805.04) mnan(   172117) mmic(  172.12) mmil( 0.17)
                      One pass: its(2000) frq( 2756.00) nhi(02) nlo(00) bbac(08) fails(    0) avr(    6372.65) mnan(    63957) mmic(   63.96) mmil( 0.06)
                      For the above, there are some failures even when using high priority tasks i.e. nhi != 0 and missed it's wake up by an excessive margin
                      Code:
                      3.3.2-preempt-c2T7400-2.16
                      Frequency pass: freq(2756) perNan(362844.70) perMic(  362.84) perMil( 0.36)
                      One pass: its(2000) frq( 2756.00) nhi(00) nlo(01) bbac(00) fails(    0) avr(   88775.06) mnan(   211109) mmic(  211.11) mmil( 0.21)
                      One pass: its(2000) frq( 2756.00) nhi(00) nlo(01) bbac(08) fails(    0) avr(   54785.60) mnan(   239303) mmic(  239.30) mmil( 0.24)
                      One pass: its(2000) frq( 2756.00) nhi(00) nlo(02) bbac(00) fails(    0) avr(   90745.55) mnan(   341535) mmic(  341.54) mmil( 0.34)
                      One pass: its(2000) frq( 2756.00) nhi(00) nlo(02) bbac(08) fails(    1) avr(   52708.94) mnan(   543801) mmic(  543.80) mmil( 0.54)
                      One pass: its(2000) frq( 2756.00) nhi(01) nlo(00) bbac(00) fails(    0) avr(   87453.27) mnan(   154366) mmic(  154.37) mmil( 0.15)
                      One pass: its(2000) frq( 2756.00) nhi(01) nlo(00) bbac(08) fails(    0) avr(    4731.63) mnan(    17300) mmic(   17.30) mmil( 0.02)
                      One pass: its(2000) frq( 2756.00) nhi(02) nlo(00) bbac(00) fails(    0) avr(   35838.07) mnan(   168112) mmic(  168.11) mmil( 0.17)
                      One pass: its(2000) frq( 2756.00) nhi(02) nlo(00) bbac(08) fails(    0) avr(    5730.70) mnan(    71996) mmic(   72.00) mmil( 0.07)
                      For the above, we see it's a little better, less failures, and on the whole the oversleep amount (avr) a lot less.
                      Code:
                      3.0.14-rt31-preempt-c2T7400-2.16
                      Frequency pass: freq(2756) perNan(362844.70) perMic(  362.84) perMil( 0.36)
                      One pass: its(2000) frq( 2756.00) nhi(00) nlo(01) bbac(00) fails(    0) avr(   78913.17) mnan(   345092) mmic(  345.09) mmil( 0.35)
                      One pass: its(2000) frq( 2756.00) nhi(00) nlo(01) bbac(08) fails(    0) avr(   52385.20) mnan(   162347) mmic(  162.35) mmil( 0.16)
                      One pass: its(2000) frq( 2756.00) nhi(00) nlo(02) bbac(00) fails(    0) avr(   88372.10) mnan(   349752) mmic(  349.75) mmil( 0.35)
                      One pass: its(2000) frq( 2756.00) nhi(00) nlo(02) bbac(08) fails(    1) avr(   55378.42) mnan(   751052) mmic(  751.05) mmil( 0.75)
                      One pass: its(2000) frq( 2756.00) nhi(01) nlo(00) bbac(00) fails(    0) avr(   28074.12) mnan(   109365) mmic(  109.36) mmil( 0.11)
                      One pass: its(2000) frq( 2756.00) nhi(01) nlo(00) bbac(08) fails(    0) avr(    5074.74) mnan(    18434) mmic(   18.43) mmil( 0.02)
                      One pass: its(2000) frq( 2756.00) nhi(02) nlo(00) bbac(00) fails(    0) avr(   65620.09) mnan(   137933) mmic(  137.93) mmil( 0.14)
                      One pass: its(2000) frq( 2756.00) nhi(02) nlo(00) bbac(08) fails(    0) avr(    5588.25) mnan(    27105) mmic(   27.11) mmil( 0.03)
                      For the above, the realtime kernel fairs more or less in the same ballpark as the 3.3.2 kernel, but mostly with average longer wakeup latencies (avr).

                      Lets look at some heavier loads (higher wakeup hz)

                      Code:
                      3.3.2-preempt-c2T7400-2.16
                      Frequency pass: freq(3000) perNan(333333.33) perMic(  333.33) perMil( 0.33)
                      One pass: its(2000) frq( 3000.00) nhi(00) nlo(01) bbac(00) fails(    0) avr(  147932.56) mnan(   189177) mmic(  189.18) mmil( 0.19)
                      One pass: its(2000) frq( 3000.00) nhi(00) nlo(01) bbac(08) fails(    1) avr(   54985.65) mnan(   846406) mmic(  846.41) mmil( 0.85)
                      One pass: its(2000) frq( 3000.00) nhi(00) nlo(02) bbac(00) fails(    1) avr(   99805.41) mnan(  1078824) mmic( 1078.82) mmil( 1.08)
                      One pass: its(2000) frq( 3000.00) nhi(00) nlo(02) bbac(08) fails(    1) avr(   50858.96) mnan(   935250) mmic(  935.25) mmil( 0.94)
                      One pass: its(2000) frq( 3000.00) nhi(01) nlo(00) bbac(00) fails(    0) avr(   86170.02) mnan(   109395) mmic(  109.39) mmil( 0.11)
                      One pass: its(2000) frq( 3000.00) nhi(01) nlo(00) bbac(08) fails(    0) avr(    4755.44) mnan(    38145) mmic(   38.15) mmil( 0.04)
                      One pass: its(2000) frq( 3000.00) nhi(02) nlo(00) bbac(00) fails(    0) avr(   40428.37) mnan(   142389) mmic(  142.39) mmil( 0.14)
                      One pass: its(2000) frq( 3000.00) nhi(02) nlo(00) bbac(08) fails(    0) avr(    5128.04) mnan(    55881) mmic(   55.88) mmil( 0.06)
                      Code:
                      3.0.14-rt31-preempt-c2T7400-2.16
                      Frequency pass: freq(3000) perNan(333333.33) perMic(  333.33) perMil( 0.33)
                      One pass: its(2000) frq( 3000.00) nhi(00) nlo(01) bbac(00) fails(    1) avr(   85895.76) mnan(   364207) mmic(  364.21) mmil( 0.36)
                      One pass: its(2000) frq( 3000.00) nhi(00) nlo(01) bbac(08) fails(    2) avr(   55065.84) mnan(   995272) mmic(  995.27) mmil( 1.00)
                      One pass: its(2000) frq( 3000.00) nhi(00) nlo(02) bbac(00) fails(    2) avr(   96075.35) mnan(   654147) mmic(  654.15) mmil( 0.65)
                      One pass: its(2000) frq( 3000.00) nhi(00) nlo(02) bbac(08) fails(    0) avr(   64900.84) mnan(   146107) mmic(  146.11) mmil( 0.15)
                      One pass: its(2000) frq( 3000.00) nhi(01) nlo(00) bbac(00) fails(    0) avr(   53375.44) mnan(   134737) mmic(  134.74) mmil( 0.13)
                      One pass: its(2000) frq( 3000.00) nhi(01) nlo(00) bbac(08) fails(    0) avr(    5056.19) mnan(     9248) mmic(    9.25) mmil( 0.01)
                      One pass: its(2000) frq( 3000.00) nhi(02) nlo(00) bbac(00) fails(    0) avr(   93658.56) mnan(   140582) mmic(  140.58) mmil( 0.14)
                      One pass: its(2000) frq( 3000.00) nhi(02) nlo(00) bbac(08) fails(    0) avr(    5230.94) mnan(     9641) mmic(    9.64) mmil( 0.01)
                      So even under heavy scheduling loads and timing constraints, the 3.3.2 kernel is actually pretty damn good when it comes to scheduling.

                      I've not yet found tight enough timing constraints that the realtime kernel fairs better than the stock 3.3.2.

                      Now for gaming it gets tricky - as other posters have mentioned up thread, you really want to be measuring the latency from say

                      mouse click -> graphical response

                      So in an ideal world, you create a USB device with a little light on it, and using a high speed camera you record when you click (light comes on) and when the screen displays a result. Then compute the latency based on the number of frames the camera recorded inbetween.

                      I know I should release it (peer review, eyes over the code etc) but for the moment it's tied to the internal code base here and I have higher priority fish to fry. I just thought some might find the figures interesting.

                      Cheers,

                      D
                      Last edited by silenceoftheass; 23 June 2012, 06:53 AM.

                      Comment

                      Working...
                      X