Readfile System Call Revised For Efficiently Reading Small Files

archkde replied

26 August 2022, 03:52 AM
Originally posted by F.Ultra View Post

One use case was linked in TFA, the last time this was brought up it was claimed that io_ring is not designed for this open->read->close type of sequence of operations, io_ring is also vastly more complex to setup than a simple call to readfile(), but most importantly Greg is the dev/inventor for both readfile() and io_uring so he obviously have good reasons for introducing a new syscall over his io_uring pet project.

io_uring is not a pet project, and as far as I know, its principal developer is not Greg Kroah-Hartman, but Jens Axboe.
Likes 3
Leave a comment:
discordian replied

26 August 2022, 03:44 AM
Another thing is that this could be implemented more reliable as you don't have to manage a file descriptor and close it both on success/error paths.
Otherwise you always have to expect that you can run out of fds, some attacks focus on exploiting these kinda bugs.
Likes 1
Leave a comment:
oiaohm replied

25 August 2022, 09:58 PM
Originally posted by AlanTuring69 View Post

I don't understand why such non-Unix syscalls are even being considered by kernel "maintainers"? Has Micro$oft really infiltrated my kernel?

p.s. if you even considered what I said to be true then re-evaluate yourself. It's obvious that this is to help some niche use-cases that just need to readfile simply. IO_uring is overkill for these use-cases and readile already exists so no reason not to make it more efficient

https://lwn.net/Articles/813827/ Readfile stuff in syscall is stage one. Once readfile is worked out for syscall it will be added to the IO_uring path as well. Readfile turning 3 operations into 1 has advantages for IO_uring as well. So improve direct syscall usage first then improve indirect usage by IO_uring next but the improvement is the same thing.

open read close by IO_uring are 3 independent entries placed on the ring buffer and readfile implement on IO_uring ring buffer would reduce that to one entry on the ring buffer so saving in ring buffer size and processing. Readfile in syscall allows 3 syscalls to be turned into 1 syscall so saving 2 context switches. There is more savings in the Readfile syscall than the IO_uring change but either way there are saving. Of course the lower savings of IO_uring change makes it lower importance to do first.

The reality is from day 1 the Linux kernel included syscalls that are not Unix syscalls.
Likes 1
Leave a comment:
oiaohm replied

25 August 2022, 09:36 PM
Originally posted by ayumu View Post

Thousands of syscalls (now one more!) and millions of LoCs. Of course, all running in supervisor mode. What could possibly go wrong.

Chromium OS Docs - Linux System Call Table

https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md

ayumu what the heck are you talking about. The Linux kernel has a few hundred syscalls. Linux kernel even with all the platform unique syscalls the Linux kernel has not crossed 400 syscalls yet.

Micro-kernels try to go for under 100 syscalls. Monolithic kernels historically have been heavier.

ayumu thousands of syscalls that matches for MS windows with its 2000+ syscalls that have existed since Windows started. But then with Windows there is only really about 1000 syscalls in production releases because roughly 50% of Windows syscalls that have existed have been deprecated and removed.

1000 syscalls appears to be upper limit. I think what you are writing is out by a power of 10. Monolithic kernels syscalls you can roughly count in hundreds and Microkernel you roughly count syscalls in tens.

ayumu I don't know of a OS kernel that in a production release that has 1100+ syscalls. MS Windows is really the biggest I know and it syscall count is way higher the next biggest. Yes Linux kernel is not the next biggest. The FreeBSD and OpenBSD and NetBSD kernels are all heavier in syscalls than the Linux kernel yes all of the BSD past the 500 syscall count in production releases over a decade ago.

Its a surprise to a lot of people that the Linux kernel is not that heavy on syscalls for a monolithic kernel. Yes Linux kernel is between 4-10 times heavier than the general micro-kernel in syscalls.

Code running in Userspace does not make it safer either. Minix proved this along time ago. Hard part is still how to audit everything.
Likes 6
Leave a comment:
jacob replied

25 August 2022, 09:33 PM
Originally posted by AlanTuring69 View Post

I don't understand why such non-Unix syscalls are even being considered by kernel "maintainers"? Has Micro$oft really infiltrated my kernel?

Your kernel is Linux, not Unix. Who are you to mandate that it must slavishly imitate Unix for ever and ever?
Likes 8
Leave a comment:
oiaohm replied

25 August 2022, 09:06 PM
Originally posted by archkde View Post

I don't really get the use-case here. What kind of application hammers sysfs with reads during its regular operation, but can't queue them in userspace and then execute them in batch using io_uring for even greater speedup?

Two new ways to read a file quickly [LWN.net]

https://lwn.net/Articles/813827/

readfile and io_uring are not two different things here. readfile syscall makes it one operation to open read and close a small file. Yes this take 3 operations and make its 1. Readfile can also be implement on io_uring once the readfile syscall works and is in the kernel. Yes same thing that instead of 3 io_uring operations you reduce to 1 one readfile is confirmed as a working syscall..

Yes each of those individual operations open, read and close do result in having to acquire locks. The same locks. Open has you acquire and release locks, read has you acquire and release locks and close has you acquire and release locks.
Likes 6
Leave a comment:
F.Ultra replied

25 August 2022, 08:47 PM
Originally posted by ayumu View Post

Thousands of syscalls (now one more!) and millions of LoCs. Of course, all running in supervisor mode. What could possibly go wrong.

There are 341 in total, not thousands, but pray tell what could go wrong from having 342 syscalls. The millions of LoCs are drivers, so of those only a small subset are running at a system at a time.
Likes 12
Leave a comment:
markg85 replied

25 August 2022, 05:56 PM
Originally posted by jrdoane View Post

What kind of speedup are we talking here? I'd assume that the claim that it's efficient for small files should come with proof. Are we really wasting that many cycles otherwise?

Look back in previous posts on this or the mailing lists. There definitely are performance numbers and IIRC they were double digit gains for sure.
But... the way you state it makes no sense in reality. Just having this optimization isn't suddenly going to make an application x times faster or more efficient. It would make a very small subset of an application (reading small files) a lot more efficient. But the application as a whole probably won't be much more efficient from a user point of view.

So which apps do benefit? Think of something like top or the gui system monitoring tools. They could become a very fair bot more efficient, might use substantially less cpu cycles themselves! But to you, as a user, it'd still work the same.
Likes 6
Leave a comment:
jrdoane replied

25 August 2022, 05:44 PM
What kind of speedup are we talking here? I'd assume that the claim that it's efficient for small files should come with proof. Are we really wasting that many cycles otherwise?
Leave a comment:
ayumu replied

25 August 2022, 04:43 PM
Thousands of syscalls (now one more!) and millions of LoCs. Of course, all running in supervisor mode. What could possibly go wrong.
Leave a comment:

Announcement

Readfile System Call Revised For Efficiently Reading Small Files

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: