Announcement

**cl333r** · 27 November 2022, 01:33 PM

Besides this being a one liner to use how else would it be different (efficiency-wise) from an io_uring implementation?

**fitzie** · 27 November 2022, 02:25 PM

Originally posted by cl333r View Post

Besides this being a one liner to use how else would it be different (efficiency-wise) from an io_uring implementation?

io_uring is about efficiency during sustained io operations, while readfile would be for one time operations. for the usage pattern readfile is targeting (top/ps) io_uring would not offer any benefit

**archkde** · 27 November 2022, 03:04 PM

Originally posted by cl333r View Post

Besides this being a one liner to use how else would it be different (efficiency-wise) from an io_uring implementation?

io_uring targets a completely different use case. With readfile, you can read *one* file in *one* syscall instead of three. With io_uring, you can read thousands of files with two or three syscalls. But yes, if the difference in syscalls actually matters, chances are you're going to read more than one file, so io_uring is going to pay off more.

**indepe** · 27 November 2022, 05:32 PM

For those wanting to use io_uring, readfile will surely become available as an io_uring operation, so you can combine the advantages, assuming that is an easy addition to io_uring. It is not either/or.

**coder** · 27 November 2022, 05:52 PM

For those of us too lazy to read the patches, can anyone answer how it defines a "small" file? And what happens if the file is too big -- truncated read or outright failure?

I'm imagining a user-supplied buffer + size. It'd be nice if the call would return the total length of the file, which you could use to determine if the read was incomplete. That would at least let you retry with a buffer that's big enough, without the further hassle of having to stat() the file.

Also, shouldn't there be a glibc patch, for a userspace counterpart? That seems like it should go in, ASAP. Then, any software which adopted it will magically benefit from the new syscall, once glibc has been updated to use it. Even if there were little chance of the syscall going in, at least it would be a minor quality-of-life improvement, for those working with small files.

**docontra** · 27 November 2022, 06:31 PM

Originally posted by coder View Post

For those of us too lazy to read the patches, can anyone answer how it defines a "small" file? And what happens if the file is too big -- truncated read or outright failure?

The patch is meant for ps/top/et al, but according to the man page it can (and will) return up to the biggest single-read operation in GNU/Linux (2 GiB minus 4 KiB), and truncate if the input buffer is too small.

Originally posted by coder View Post

I'm imagining a user-supplied buffer + size. It'd be nice if the call would return the total length of the file, which you could use to determine if the read was incomplete. That would at least let you retry with a buffer that's big enough, without the further hassle of having to stat() the file.

Your imagination is this patch's reality

(not the part where it returns the file size with the actual read size tho).

Originally posted by coder View Post

Also, shouldn't there be a glibc patch, for a userspace counterpart? That seems like it should go in, ASAP. Then, any software which adopted it will magically benefit from the new syscall, once glibc has been updated to use it. Even if there were little chance of the syscall going in, at least it would be a minor quality-of-life improvement, for those working with small files.

glibc-Linux syscall coverage is a... rather contentious topic, but there should be a wrapper for this syscall after it goes in with little to no trouble.

**indepe** · 27 November 2022, 06:37 PM

Originally posted by coder View Post

For those of us too lazy to read the patches, can anyone answer how it defines a "small" file? And what happens if the file is too big -- truncated read or outright failure?

I'm imagining a user-supplied buffer + size. It'd be nice if the call would return the total length of the file, which you could use to determine if the read was incomplete. That would at least let you retry with a buffer that's big enough, without the further hassle of having to stat() the file.

Also, shouldn't there be a glibc patch, for a userspace counterpart? That seems like it should go in, ASAP. Then, any software which adopted it will magically benefit from the new syscall, once glibc has been updated to use it. Even if there were little chance of the syscall going in, at least it would be a minor quality-of-life improvement, for those working with small files.

This is the API:
ssize_t readfile(int dirfd, const char *pathname, void *buf, size_t count, int flags );

I think "small" just means small enough to fit in a single buffer. It seems you could also use it to read the beginning of large files.
As far as I can tell, you know it is the complete file if the returned size is smaller than "count". Thus the buffer should be at least 1 byte larger than any file size you wish to determine as "complete".

**nh2_** · 27 November 2022, 10:25 PM

readfile could also significantly benefit networked filesystems like Ceph, because it removes network roundtrips for multiple syscalls (open/read/close) and it removes state (the open file description), which in turn might simplify operations and perhaps make things faster.

**coder** · 28 November 2022, 03:52 AM

Originally posted by indepe View Post

This is the API:
ssize_t readfile(int dirfd, const char *pathname, void *buf, size_t count, int flags );

Uh, I'm a little triggered by seeing flags. I sure hope there's no need for O_CLOEXEC (i.e. that it's implied or unnecessary), because that'd make it not so atomic.

Announcement

Readfile System Call Patches Revisited For Efficiently Reading Small Files

Readfile System Call Patches Revisited For Efficiently Reading Small Files

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment