Announcement

Collapse
No announcement yet.

Linux 5.11 To Land Optimization That Helps IO_uring Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linux 5.11 To Land Optimization That Helps IO_uring Performance

    Phoronix: Linux 5.11 To Land Optimization That Helps IO_uring Performance

    At the start of October we mentioned a kernel optimization that can help IO_uring performance. Now as we approach the end of the month, Linux 5.11 is poised to land the optimization that especially helps out with threaded workloads...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Jens Axboe notes in his tests that with this improvement the IO_uring performance in his test jumped from 1.38M requests/second up to 1.67M requests/s. Besides achieving 22% higher throughput, the CPU usage was also lower thanks to less locking.
    I imagine it's a synthetic test, it doesn't magically make the threaded part of apps 22% faster, right? If so, the real benefit would be around <= 1%, right?

    Comment


    • #3
      Besides achieving 22% higher throughput, the CPU usage was also lower thanks to less locking.
      I'm not sure that lower CPU usage is a good thing in this case. Although it suggests that throughput could be optimized even further, on a computer that doesn't have anything else to do, the CPUs should not remain unused part of the time (on a test that runs on localhost, I assume).

      Comment


      • #4
        Originally posted by indepe View Post

        I'm not sure that lower CPU usage is a good thing in this case. Although it suggests that throughput could be optimized even further, on a computer that doesn't have anything else to do, the CPUs should not remain unused part of the time (on a test that runs on localhost, I assume).
        So higher CPU usage, more context switches and lower throughput is better for you?

        Comment


        • #5
          Originally posted by Volta View Post
          So higher CPU usage, more context switches and lower throughput is better for you?
          I kind of expected this would not go over easily....

          What matters here is throughput, and the lower CPU usage is a sign that an even higher throughput may be possible. In other words, although the new version is better than the old version, it probably has some kind of problem.

          The lower CPU usage is an indication that something is non-optimal, since it is not a good thing to leave CPUs inactive if/when they could do something useful.
          Last edited by indepe; 30 October 2020, 05:01 AM.

          Comment


          • #6
            Originally posted by indepe View Post

            I kind of expected this would not go over easily....

            What matters here is throughput, and the lower CPU usage is a sign that an even higher throughput may be possible. In other words, although the new version is better than the old version, it probably has some kind of problem.

            The lower CPU usage is an indication that something is non-optimal, since it is not a good thing to leave CPUs inactive if/when they could do something useful.
            I knew what you were thinking. CPU have more time now for throughput, but maybe it reached some other (hardware?) limits.

            Because of the shared ring buffers between the kernel and user space, io_uring can be a zero-copy system. Copying bytes around becomes necessary when system calls that transfer data between kernel and user space are involved. But since the bulk of the communication in io_uring is via buffers shared between the kernel and user space, this huge performance overhead is completely avoided.

            With some clever use of shared ring buffers, io_uring performance is really memory-bound, since in polling mode, we can do away with system calls altogether.
            Last edited by Volta; 30 October 2020, 06:02 AM.

            Comment


            • #7
              Originally posted by Volta View Post
              I knew what you were thinking. CPU have more time now for throughput, but maybe it reached some other (hardware?) limits.
              I certainly wasn't thinking that the old version would be better.

              The same paragraph that talks about io_uring performance being memory-bound, also mentions that io_uring is capable of 1.7M 4k IOPS. If that's true I'd be surprised if processing 1.7M echo packets runs into a memory bottleneck or any other hardware limit, but I wouldn't really know.

              Comment


              • #8
                Originally posted by indepe View Post

                I certainly wasn't thinking that the old version would be better.
                What it seems from the article is less time CPU gets into the way the higher throughput you have. You're expecting CPU to speed things up, but it seems the CPU is not the one responsible for performance in this case.

                Comment


                • #9
                  Well...I for one like ANY bit of uplift I can get because it's synergistic to the other pieces of the kernel that are also getting optimized. And outside the kernel such as the improvements to Mesa Marik has just sent in.

                  Added together...2-5% here....2-5% there...the synergistic effect is, hopefully, greater than the individual parts.

                  Comment


                  • #10
                    Looks too much like IO urin. Someone picked a winner there.

                    Comment

                    Working...
                    X