Announcement

Collapse
No announcement yet.

Readfile System Call Revised For Efficiently Reading Small Files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by oiaohm View Post

    Typo you typed 3 context switches instead of 2. Yes it 6 context switches vs 2 context switches or 3 syscall vs 1 syscall(simple times by 2 to get to context switches).
    Quite generous of you to call that a typo.
    But yes you're right, you start in user space, switch to kernel space and back, that makes for two context switches.
    Or actually, I'm not sure if a syscall on Linux necessarily means a full context switch happens in all cases. It doesn't seem to be a hard requirement. There is still some cost to switching from user to kernel space of course and the relative increase in cost due to mitigations is quite similar to the cost of a full context switch as far as I know so this is just nitpicking. And I'm not even sure how Linux handles it.

    Comment


    • #22
      Originally posted by oiaohm View Post

      Readfile can also be implement on io_uring once the readfile syscall works and is in the kernel. Yes same thing that instead of 3 io_uring operations you reduce to 1 one readfile is confirmed as a working syscall..

      You did not read what I wrote carefully enough. Do notice here I was talking about io_ring. The lwm article https://lwn.net/Articles/813827/ also talked about Readfile will in time be added to io_uring when it proven to to work.

      Yes each of those individual operations open, read and close do result in having to acquire locks. The same locks. Open has you acquire and release locks, read has you acquire and release locks and close has you acquire and release locks.

      This bit from my prior post is important. Also not all the items reading thousands of small files are going to be setting up io_uring.

      archkde you are still treating readfile and io_uring as two different things. Readfile syscall is the prototype to what will be readfile io_uring in future because io_uring has performance issues doing open read close rapidly as small files cause the locks problem. You can be attempting to close a file handle while lock created when it opened has not been cleared yet on a small file with io_uring. Yes io_uring by passed the context switch problem of syscalls leading to the result small files are problems due to locking. Yes with the overhead of a context switch on open read and close there is enough time for the lock open caused to be clear before the close gets called.

      Small files are there own unique headache. Readfile and io_uring are not two split things. With the locking fun of file access getting open read and close block right on a syscall is most likely safer than attempt to straight up add Readfile to io_uring where things are moving faster so race conditions and other things come risk.
      Readfile is not going to land in io_uring per the article you mentioned. What they are planning is to add a functionality to chain the openat with the read (and the close) so that everything can be done in 1 io_uring entry instead of 2.

      I don't know where you get the stuff about locks from. I can believe it, but it's probably not a huge problem, since an uncontended lock should be pretty fast (much faster than all the other stuff the kernel has to do when opening/reading/closing a file). If you have more information here, i'd be happy to hear about this.

      And it's a pity that applications won't use io_uring where it makes sense. It's rather simple to set up for the synchronous batch submission use case, and the speedup is much larger than what you get from readfile.

      Comment


      • #23
        Originally posted by archkde View Post
        Readfile is not going to land in io_uring per the article you mentioned. What they are planning is to add a functionality to chain the openat with the read (and the close) so that everything can be done in 1 io_uring entry instead of 2.
        That there is going to be based off the readfile work.

        Originally posted by archkde View Post
        I don't know where you get the stuff about locks from. I can believe it, but it's probably not a huge problem, since an uncontended lock should be pretty fast (much faster than all the other stuff the kernel has to do when opening/reading/closing a file). If you have more information here, i'd be happy to hear about this.
        Small files turn out to be the case where its not uncontended lock all of the time. open and close end up hitting the file system directory lock. Areas like sysfs activity on this lock can be very active. So open has to take and release the lock and close has to take and release the lock. Large file reads/writes this is not a problem in fact it would be a problem to hold the lock while performing the read/write for a large read/write due to preventing other directory actions.

        Worst case with small files with network file systems or other file systems with performance lag is release from open of the directory lock can delayed. Open read close path the directory lock release is not processed before close is attempting to grab the lock. Small file problem where at times its better not to release the directory lock because the directory lock will not be released anyhow before close will attempt to ask for it again.

        Remember multi applications can be hitting sysfs with stacks of small file reads. io_uring is that one application is doing to do a stack of file operations. Io_uring is not for application needing to perform a single file operation and this happens to happen with a stack of other applications doing the same thing. There need to be two solutions to this problem to reduce the lock thrashing.

        openat with io_uring doing open read and close in one pass solves the same locking problem.

        Comment


        • #24
          Some of my nginx caches handle millions of thumbnails, and they all get read on startup, taking time, CPU and IOPS. It would be good if that could be reduced.

          Comment

          Working...
          X