Announcement

Collapse
No announcement yet.

Linux 4.12 I/O Scheduler Benchmarks: BFQ, Kyber, Etc

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by coder111 View Post
    All of these benchmarks measure throughput...

    I wonder how it would be possible to measure latency/responsiveness. Run some IO-intensive task in background, and run some pre-recorded GUI click-thorugh using mouse/keyboard emulation, and point camera at the screen and measure when things pop-up? (Or record video using the videocard, or take screenshots).

    Alternatively, run some 3D game + some CPU/HDD intensive background task, and measure min-FPS?

    Write some synthetic latency/responsiveness benchmark which measures time it takes to read 1st byte from file or get a CPU slice?

    As per old saying- whatever is not measured, is lost... Until we have a decent way to measure latency/responsiveness, it will continue to be mostly ignored as a concern...
    You could probably just measure page render times loading pages off the local hard disk while some extremely heavy database load or something eats all the I/O performance in the background.

    Comment


    • #12
      benching this sort of thing is difficult to relate to real-world stuff because in the real world it matters a lot more when multi-tasking. How about compiling the Linux kernel, or Firefox, or something, in the background, while testing foreground stuff? I don't know how to make that happen in an accurate manner, but that's the kind of thing that would matter more, in my opinion.

      Comment


      • #13
        I thought BFQ was for HDD only, right?

        Comment


        • #14
          The regressions that are common to all blk-mq schedulers are most certainly due to problems of the blk-mq framework. I haven't seen any such problem with my devices, so I guess it is something device-specific. The other regressions are due to the fact that, one one side, all tests are run with the default BFQ configuration, geared towards responsiveness and low latency for soft real time applications, while, on the oppose side, all those tests are throughput-centric (even those reporting time as a figure of merit). That configuration does sacrifice throughput when needed for application and system-level latency. With a throughput-centric, synthetic workload, the only concrete result is a loss of performance on the throughput-related figures of merit under test.

          These tests are very important to me, as they confirm that I have definitely not been good enough in informing people on how to reconfigure BFQ, very easily, if they don't want to use BFQ for the use cases for which it has been fine tuned over these years, but only for the typical server-like workloads.

          So, let me start right now: if you are concerned only about throughput and the like, then just set to 0 the low_latency parameter. This switches off all extra low-latency mechanisms. If this is not enough on your flash-based device, then set slice_idle to 0 too. If this is still not enough, then contact me and I'll start a new bug hunt

          Comment


          • #15
            There is a German saying "Wer misst, misst Mist.", which says "Who measures, measures shit.".

            It means that experimental data, sensor data, measurements, etc. are all worthless if you don't think about what the numbers you aquired actually mean.
            As others have pointed out, these throughput measurements alone are almost worthless, since that is not the primary task of a scheduler.

            A scheduler is for scheduling, hence several I/O tasks should be performed simultaneously and the latency/duration of each single one has to be measured.
            Then the total duration of all tasks together is one important criterium, but the latency of single tasks is what kills user experience. If the scheduler implements some form of importance voting mechanism (like the nice-value for CPU schedulers), that should be compared as well.

            I have a game which creates a memory overflow under certain conditions, filling up all RAM and the swap file afterwards.
            As soon as it starts writing to the swap file (which is not on my system SSD!), I am not able to move my mouse cursor anymore or to make keyboard inputs.
            The only thing I can do is to push the reset button on my computer. This is what an I/O scheduler should prevent.

            Comment


            • #16
              The regressions that are common to all blk-mq schedulers are most
              certainly due to problems of the blk-mq framework. I haven't seen any
              such problem with my devices, so I guess it is something
              device-specific. The other regressions are due to the fact that, one
              one side, all tests are run with the default BFQ configuration, geared
              towards responsiveness and low latency for soft real time
              applications, while, on the oppose side, all those tests are
              throughput-centric (even those reporting time as a figure of merit).
              That configuration does sacrifice throughput when needed for getting
              a low application and system-level latency. With a throughput-centric,
              synthetic workload, the only concrete result is a loss of performance
              on the throughput-related figures of merit under test.
              These tests are very important to me, as they let me understand that I
              have to work much more on informing people on how to reconfigure BFQ,
              very easily, if they don't want to use BFQ to get a low application-
              or system-level latency, but to achieve a high throughput in a
              server-like workload.

              Let me start right now: if you are concerned only/mainly about
              throughput, then set to 0 the low_latency parameter. This switches
              off all extra low-latency mechanisms. If this is not enough on your
              flash-based device, then set slice_idle to 0 too. If problems
              persist, then contact me, and I'll start a new bug hunt

              Comment


              • #17
                Originally posted by paolo View Post
                The other regressions are due to the fact that, one one side, all tests are run with the default BFQ configuration, geared towards responsiveness and low latency for soft real time applications, while, on the oppose side, all those tests are throughput-centric
                Thank you for your work, it's much appreciated !

                Originally posted by cj.wijtmans View Post
                so noop is actually better?
                Yeah. The whole point of BFQ (and Kyber for multiple queue) is that do not simply give all the time to the topmost task (the benchmark), but keep some reserved for other tasks, like de user-interface.

                It's entirely expected that CFG, CFQ, and Kyber will perform "worse", this difference is actually slots getting diverted and going to the desktop, so it mustn't wait too much to get its own work done.

                Where they should perform better is in desktop responsivity, but I don't think that Michael has tests ready to measure user interface latency.
                (A possibility would be to run a throughput intensive benchmark in the background, then run a video player in the foreground and check the stats about dropped frames.
                With BFQ and Kyber, Xine would supposedly player videos smoother.).

                Originally posted by davidbepo View Post
                wow bfq performs horrible , maybe paolo can fix it for 4.13
                i like the responsiveness of bfq so i would like to see it performing well
                No you can't "fix" it that much. It's part of the expected behaviour.
                You can't keep all the IO reserved for the benchmark, while simultaneousy keeping IO aside and giving it to background tasks.
                (in "TL;DR:"-way that "responsiveness" you feel is exactly that missed performance, the missing time on the bench result was stolen by your desktop being more responsive).

                CPU Cycles, IO throughput, etc. are a scarce ressource.
                Responsiveness (making sure that everybody got a share of the ressource without waiting too much : even background application get their turn at the disk drive)
                is at direct contradiction with performance (all the IO and cycle go to the top priority app as much as it can consume them, background task only get the left-overs once in a while when the top task isn't using them, desktop start to stutter noticeably).

                What you can measure :
                - in headless server, with all the other tasks and daemon shut down : BFQ shouldn't performe much worse than the rest, even if of course its scheduling method has some overhead. (maybe it's performing a bit worse than expected, but without complete isolation there's no way to be sure).
                - in a multitasking load, each task should get its share of IO within a reasonably short timeframe (as mentioned above : will a video player start to drop too many frames ?).

                Comment


                • #18
                  paolo, thank you for your persistence in developing and trying to merge BFQ over the years.

                  Michael, would you be willing to retry the tests with BFQ's low_latency parameter disabled? I'm not sure what to suggest (from the PTS) for latency tests. As others have said, running a game with a compilation in the background might work, although it's not ideal.

                  Comment


                  • #19
                    Originally posted by GrayShade View Post
                    paolo, thank you for your persistence in developing and trying to merge BFQ over the years.

                    My pleasure

                    Michael, would you be willing to retry the tests with BFQ's low_latency parameter disabled? I'm not sure what to suggest (from the PTS) for latency tests. As others have said, running a game with a compilation in the background might work, although it's not ideal.
                    High latencies and bad responsiveness are as evident to users, as non trivial to measure (correctly). That's why we made an ad hoc benchmark suite for this purpose:
                    I/O Benchmark Suite . Contribute to Algodev-github/S development by creating an account on GitHub.


                    It's rather rustic, but, of course, I'd be glad to help if needed.

                    Comment


                    • #20
                      Something on these https://www.youtube.com/watch?v=J-e7LnJblm8 lines would be nice.

                      Comment

                      Working...
                      X