Announcement

Collapse
No announcement yet.

Readfile System Call Patches Revisited For Efficiently Reading Small Files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Readfile System Call Patches Revisited For Efficiently Reading Small Files

    Phoronix: Readfile System Call Patches Revisited For Efficiently Reading Small Files

    Talked about for over two years now has been a "readfile" system call to efficiently read small files. This should be a win when dealing with small files like those exposed via sysfs while it's taken time to come together and stalled out several times. This week Greg Kroah-Hartman has updated the readfile patches leading to hope that this new syscall might finally be on a path for mainlining...

    https://www.phoronix.com/news/readfi...-call-Nov-2022

  • #2
    Besides this being a one liner to use how else would it be different (efficiency-wise) from an io_uring implementation?

    Comment


    • #3
      Originally posted by cl333r View Post
      Besides this being a one liner to use how else would it be different (efficiency-wise) from an io_uring implementation?
      io_uring is about efficiency during sustained io operations, while readfile would be for one time operations. for the usage pattern readfile is targeting (top/ps) io_uring would not offer any benefit

      Comment


      • #4
        Originally posted by cl333r View Post
        Besides this being a one liner to use how else would it be different (efficiency-wise) from an io_uring implementation?
        io_uring targets a completely different use case. With readfile, you can read *one* file in *one* syscall instead of three. With io_uring, you can read thousands of files with two or three syscalls. But yes, if the difference in syscalls actually matters, chances are you're going to read more than one file, so io_uring is going to pay off more.

        Comment


        • #5
          For those wanting to use io_uring, readfile will surely become available as an io_uring operation, so you can combine the advantages, assuming that is an easy addition to io_uring. It is not either/or.

          Comment


          • #6
            For those of us too lazy to read the patches, can anyone answer how it defines a "small" file? And what happens if the file is too big -- truncated read or outright failure?

            I'm imagining a user-supplied buffer + size. It'd be nice if the call would return the total length of the file, which you could use to determine if the read was incomplete. That would at least let you retry with a buffer that's big enough, without the further hassle of having to stat() the file.

            Also, shouldn't there be a glibc patch, for a userspace counterpart? That seems like it should go in, ASAP. Then, any software which adopted it will magically benefit from the new syscall, once glibc has been updated to use it. Even if there were little chance of the syscall going in, at least it would be a minor quality-of-life improvement, for those working with small files.

            Comment


            • #7
              Originally posted by coder View Post
              For those of us too lazy to read the patches, can anyone answer how it defines a "small" file? And what happens if the file is too big -- truncated read or outright failure?
              The patch is meant for ps/top/et al, but according to the man page it can (and will) return up to the biggest single-read operation in GNU/Linux (2 GiB minus 4 KiB), and truncate if the input buffer is too small.

              Originally posted by coder View Post
              I'm imagining a user-supplied buffer + size. It'd be nice if the call would return the total length of the file, which you could use to determine if the read was incomplete. That would at least let you retry with a buffer that's big enough, without the further hassle of having to stat() the file.
              Your imagination is this patch's reality (not the part where it returns the file size with the actual read size tho).

              Originally posted by coder View Post
              Also, shouldn't there be a glibc patch, for a userspace counterpart? That seems like it should go in, ASAP. Then, any software which adopted it will magically benefit from the new syscall, once glibc has been updated to use it. Even if there were little chance of the syscall going in, at least it would be a minor quality-of-life improvement, for those working with small files.
              glibc-Linux syscall coverage is a... rather contentious topic, but there should be a wrapper for this syscall after it goes in with little to no trouble.

              Comment


              • #8
                Originally posted by coder View Post
                For those of us too lazy to read the patches, can anyone answer how it defines a "small" file? And what happens if the file is too big -- truncated read or outright failure?

                I'm imagining a user-supplied buffer + size. It'd be nice if the call would return the total length of the file, which you could use to determine if the read was incomplete. That would at least let you retry with a buffer that's big enough, without the further hassle of having to stat() the file.

                Also, shouldn't there be a glibc patch, for a userspace counterpart? That seems like it should go in, ASAP. Then, any software which adopted it will magically benefit from the new syscall, once glibc has been updated to use it. Even if there were little chance of the syscall going in, at least it would be a minor quality-of-life improvement, for those working with small files.
                This is the API:
                ssize_t readfile(int dirfd, const char *pathname, void *buf, size_t count, int flags );
                ‚Äč
                I think "small" just means small enough to fit in a single buffer. It seems you could also use it to read the beginning of large files.
                As far as I can tell, you know it is the complete file if the returned size is smaller than "count". Thus the buffer should be at least 1 byte larger than any file size you wish to determine as "complete".

                Comment


                • #9
                  readfile could also significantly benefit networked filesystems like Ceph, because it removes network roundtrips for multiple syscalls (open/read/close) and it removes state (the open file description), which in turn might simplify operations and perhaps make things faster.

                  Comment


                  • #10
                    Originally posted by indepe View Post
                    This is the API:
                    ssize_t readfile(int dirfd, const char *pathname, void *buf, size_t count, int flags );
                    Uh, I'm a little triggered by seeing flags. I sure hope there's no need for O_CLOEXEC (i.e. that it's implied or unnecessary), because that'd make it not so atomic.

                    Comment

                    Working...
                    X