Announcement

Collapse
No announcement yet.

More Optimizations Has Linux Approaching 7M IOPS Per Core

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • More Optimizations Has Linux Approaching 7M IOPS Per Core

    Phoronix: More Optimizations Has Linux Approaching 7M IOPS Per Core

    Linux block subsystem maintainer and IO_uring lead developer Jens Axboe continues making staggering optimizations to the kernel code to squeeze out the maximum performance potential out of his shiny new system...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    When he made it to 5.1M, wasn't there a claim he was being held back by the physical medium? What changed?
    Last edited by bug77; 11 October 2021, 11:24 AM.

    Comment


    • #3
      Originally posted by bug77 View Post
      When he made it to 5.1M, wasn't there a claim h was being held back by the physical medium? What changed?
      A good question indeed. I am assuming that he will be using a fake, software-based test device instead of the real hardware to surpass the hardware limitation for now.

      Comment


      • #4
        Originally posted by bug77 View Post
        When he made it to 5.1M, wasn't there a claim he was being held back by the physical medium? What changed?
        He's using two Optane devices now, so single core submitting IO to the two devices.

        Comment


        • #5
          We need simple fully working source code examples, I'm not sure what it does, does it replace poll/epoll or has nothing to do with them?
          I checked like a year ago and it (io_uring) was so low level that I lost interest instantly.

          Comment


          • #6
            Presumably, Facebook has enough tracing in their stack to know how many disk hits are involved in a typical targeted ad delivery.

            I wonder if Jens Axboe has calculated what 7M IOPS is in swindles per second?

            Comment


            • #7
              Originally posted by cl333r View Post
              We need simple fully working source code examples
              https://git.kernel.dk/cgit/fio/plain/t/io_uring.c
              Originally posted by cl333r View Post
              , I'm not sure what it does, does it replace poll/epoll or has nothing to do with them?
              it's a replacement for aio, i.e. it's somewhat related

              Comment


              • #8
                Originally posted by pal666 View Post
                https://git.kernel.dk/cgit/fio/plain/t/io_uring.c
                it's a replacement for aio, i.e. it's somewhat related
                It can also replace poll/epoll though.

                Also, it *is* a bit lower level interface, in that an asynchronous interface will necessarily be more complicated as it can't be just a function call at the C layer (at least, not in a language without async/await syntactic sugar). Blocking would defeat the purpose, as the ability to pack lots of unrelated IO bound work onto a single OS thread is part of what makes this so fast.

                Edit: also, since this is (or has become) a sort of general purpose asynchronous I/O framework, it doesn't use the traditional poll/epoll "readiness" model (which waits for a file descriptor to be ready for more read/write operations, usually when its buffers are drained) though it can bridge to that model as well. It's primarily instead a "completion" based model, where you submit operations and you get a notified when each individual operation completes (though you can chain them together or batch them or... well, you can do a lot.)

                One final thing to note is that in these benchmarks all of the "go fast" options are being used, which makes it even less user friendly and more use case specific. In particular the devices and API are being used in polling mode, which means the CPU core is always pretty much occupied 100% by talking to the device even if there is not much happening. That's not at all a problem if your server does only a single thing and is loaded heavily pretty much always, but might be suboptimal for other uses. It's still plenty fast in more traditional use cases though, just not *this* fast.
                Last edited by zcansi; 11 October 2021, 05:50 PM.

                Comment

                Working...
                X