Originally posted by coder111
View Post
Announcement
Collapse
No announcement yet.
Linux 4.12 I/O Scheduler Benchmarks: BFQ, Kyber, Etc
Collapse
X
-
benching this sort of thing is difficult to relate to real-world stuff because in the real world it matters a lot more when multi-tasking. How about compiling the Linux kernel, or Firefox, or something, in the background, while testing foreground stuff? I don't know how to make that happen in an accurate manner, but that's the kind of thing that would matter more, in my opinion.
- Likes 2
Comment
-
The regressions that are common to all blk-mq schedulers are most certainly due to problems of the blk-mq framework. I haven't seen any such problem with my devices, so I guess it is something device-specific. The other regressions are due to the fact that, one one side, all tests are run with the default BFQ configuration, geared towards responsiveness and low latency for soft real time applications, while, on the oppose side, all those tests are throughput-centric (even those reporting time as a figure of merit). That configuration does sacrifice throughput when needed for application and system-level latency. With a throughput-centric, synthetic workload, the only concrete result is a loss of performance on the throughput-related figures of merit under test.
These tests are very important to me, as they confirm that I have definitely not been good enough in informing people on how to reconfigure BFQ, very easily, if they don't want to use BFQ for the use cases for which it has been fine tuned over these years, but only for the typical server-like workloads.
So, let me start right now: if you are concerned only about throughput and the like, then just set to 0 the low_latency parameter. This switches off all extra low-latency mechanisms. If this is not enough on your flash-based device, then set slice_idle to 0 too. If this is still not enough, then contact me and I'll start a new bug hunt
- Likes 11
Comment
-
There is a German saying "Wer misst, misst Mist.", which says "Who measures, measures shit.".
It means that experimental data, sensor data, measurements, etc. are all worthless if you don't think about what the numbers you aquired actually mean.
As others have pointed out, these throughput measurements alone are almost worthless, since that is not the primary task of a scheduler.
A scheduler is for scheduling, hence several I/O tasks should be performed simultaneously and the latency/duration of each single one has to be measured.
Then the total duration of all tasks together is one important criterium, but the latency of single tasks is what kills user experience. If the scheduler implements some form of importance voting mechanism (like the nice-value for CPU schedulers), that should be compared as well.
I have a game which creates a memory overflow under certain conditions, filling up all RAM and the swap file afterwards.
As soon as it starts writing to the swap file (which is not on my system SSD!), I am not able to move my mouse cursor anymore or to make keyboard inputs.
The only thing I can do is to push the reset button on my computer. This is what an I/O scheduler should prevent.
- Likes 3
Comment
-
The regressions that are common to all blk-mq schedulers are most
certainly due to problems of the blk-mq framework. I haven't seen any
such problem with my devices, so I guess it is something
device-specific. The other regressions are due to the fact that, one
one side, all tests are run with the default BFQ configuration, geared
towards responsiveness and low latency for soft real time
applications, while, on the oppose side, all those tests are
throughput-centric (even those reporting time as a figure of merit).
That configuration does sacrifice throughput when needed for getting
a low application and system-level latency. With a throughput-centric,
synthetic workload, the only concrete result is a loss of performance
on the throughput-related figures of merit under test.
These tests are very important to me, as they let me understand that I
have to work much more on informing people on how to reconfigure BFQ,
very easily, if they don't want to use BFQ to get a low application-
or system-level latency, but to achieve a high throughput in a
server-like workload.
Let me start right now: if you are concerned only/mainly about
throughput, then set to 0 the low_latency parameter. This switches
off all extra low-latency mechanisms. If this is not enough on your
flash-based device, then set slice_idle to 0 too. If problems
persist, then contact me, and I'll start a new bug hunt
- Likes 8
Comment
-
Originally posted by paolo View PostThe other regressions are due to the fact that, one one side, all tests are run with the default BFQ configuration, geared towards responsiveness and low latency for soft real time applications, while, on the oppose side, all those tests are throughput-centric
Originally posted by cj.wijtmans View Postso noop is actually better?
It's entirely expected that CFG, CFQ, and Kyber will perform "worse", this difference is actually slots getting diverted and going to the desktop, so it mustn't wait too much to get its own work done.
Where they should perform better is in desktop responsivity, but I don't think that Michael has tests ready to measure user interface latency.
(A possibility would be to run a throughput intensive benchmark in the background, then run a video player in the foreground and check the stats about dropped frames.
With BFQ and Kyber, Xine would supposedly player videos smoother.).
Originally posted by davidbepo View Postwow bfq performs horrible , maybe paolo can fix it for 4.13
i like the responsiveness of bfq so i would like to see it performing well
You can't keep all the IO reserved for the benchmark, while simultaneousy keeping IO aside and giving it to background tasks.
(in "TL;DR:"-way that "responsiveness" you feel is exactly that missed performance, the missing time on the bench result was stolen by your desktop being more responsive).
CPU Cycles, IO throughput, etc. are a scarce ressource.
Responsiveness (making sure that everybody got a share of the ressource without waiting too much : even background application get their turn at the disk drive)
is at direct contradiction with performance (all the IO and cycle go to the top priority app as much as it can consume them, background task only get the left-overs once in a while when the top task isn't using them, desktop start to stutter noticeably).
What you can measure :
- in headless server, with all the other tasks and daemon shut down : BFQ shouldn't performe much worse than the rest, even if of course its scheduling method has some overhead. (maybe it's performing a bit worse than expected, but without complete isolation there's no way to be sure).
- in a multitasking load, each task should get its share of IO within a reasonably short timeframe (as mentioned above : will a video player start to drop too many frames ?).
- Likes 3
Comment
-
paolo, thank you for your persistence in developing and trying to merge BFQ over the years.
Michael, would you be willing to retry the tests with BFQ's low_latency parameter disabled? I'm not sure what to suggest (from the PTS) for latency tests. As others have said, running a game with a compilation in the background might work, although it's not ideal.
Comment
-
Originally posted by GrayShade View Postpaolo, thank you for your persistence in developing and trying to merge BFQ over the years.
My pleasure
Michael, would you be willing to retry the tests with BFQ's low_latency parameter disabled? I'm not sure what to suggest (from the PTS) for latency tests. As others have said, running a game with a compilation in the background might work, although it's not ideal.
It's rather rustic, but, of course, I'd be glad to help if needed.
- Likes 2
Comment
Comment