Announcement

Collapse
No announcement yet.

Linux Work Culminating On A "READFILE" Syscall For Reading Small Files Efficiently

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Isedonde
    replied
    Originally posted by xpue View Post
    Perhaps also should consider making system calls to work with multiple files/whatever in one call, in batches.
    AFAIK, that's already possible using io_uring. It's really flexible and powerful.

    After initially setting up the io_uring ring buffers at application startup, you'd have to prepare a bunch of "instructions" in the ring buffer to open e.g. 100 files, which happens in user space and is just writing to ring buffer memory. Then you tell the kernel "hey, I've prepared 100 new work items for your in the ring buffer. Please wake me up when you've finished processing all of them/some of them/I'll do other work in the meantime and check back with you later". This is just one syscall. Then when the kernel has finished opening the files, you receive the FDs on the other ring buffer. So now you can put 100 read "instructions" for those FDs on the ring buffer, and possibly add 100 close "instructions" too if you no longer need the FDs after reading. Then you call the kernel again to say "please work on those new 100/200 IO instructions now" and you can again choose to sleep until all or some of the work is done, or do other work while the kernel is busy reading (and possibly closing) the files.

    io_uring is really great.

    Apparently they want to add a way so you can do the same task in just one syscall, where the read and close "instructions" wouldn't have to explicitly state the FD they want to read/close, but instead you could say "use the FD that was opened in this 'instruction' I've put on the ring buffer previously". So that would mean you could do everything in one submit and just wait for the kernel to fill the buffers you've supplied.

    Leave a comment:


  • archsway
    replied
    Originally posted by jacob View Post

    And the reason that we do have dog-slow CPUs is precisely because they have slow to decode addressing modes and a CISC instruction set that allows generating bloated code, that only Intel and AMD can implement at a reasonable speed. RISC (one operation = one instruction) has proven useful for anything Intel didn't manage to kill with Itanium.
    FTFY.

    Leave a comment:


  • jacob
    replied
    Originally posted by markg85 View Post
    This is a wrapper over splice and doesn't work the way I want. You need to wait on in_fd and trigger the transfer for every batch of data (with unnecessary user-kernel trampoline every time), in_fd can't be a socket etc. It works for what it's intended to do, which is a file server, but it doesn't work for something like an on-demand tunnel manager. What I would like is to be able to "attach" an input file descriptor to an output one and have all data simply pass through (for example from socket to socket, or from local socket to inet socket etc.) with no further intervention.

    Leave a comment:


  • JustinTurdeau
    replied
    Originally posted by xpue View Post
    Perhaps also should consider making system calls to work with multiple files/whatever in one call, in batches.
    People are already talking about using/improving io_uring to do that on the LKML thread.
    Last edited by JustinTurdeau; 24 May 2020, 09:15 PM.

    Leave a comment:


  • markg85
    replied
    Originally posted by jacob View Post

    Here is one I always wanted: be able to pass through a file descriptor. Say you want all input on fd X to be directly forwarded to output fd Y (by the kernel itself, without jumping into and back from user space). The splice() functionality kinda sorta lets you do that but it's clunky and doesn't work in all scenarios.
    http://man7.org/linux/man-pages/man2/sendfile.2.html
    You're welcome

    Leave a comment:


  • JustinTurdeau
    replied
    Originally posted by jacob View Post
    You realise that there is no performance difference whatsoever between ioctl() and a separate syscall, don't you? On a second thought, no, you probably don't.
    Shoehorning 50 pseudo-syscalls into a single ioctl() syscall is a shitty hack and there definitely is a performance difference inside kernel space. Hence why actual kernel maintainers opt to do it the proper way.
    Last edited by tildearrow; 25 May 2020, 12:51 AM.

    Leave a comment:


  • jacob
    replied
    Originally posted by xpue View Post
    Perhaps also should consider making system calls to work with multiple files/whatever in one call, in batch mode.
    Here is one I always wanted: be able to pass through a file descriptor. Say you want all input on fd X to be directly forwarded to output fd Y (by the kernel itself, without jumping into and back from user space). The splice() functionality kinda sorta lets you do that but it's clunky and doesn't work in all scenarios.

    Leave a comment:


  • xpue
    replied
    Perhaps also should consider making system calls to work with multiple files/whatever in one call, in batches.
    Last edited by xpue; 24 May 2020, 07:22 PM.

    Leave a comment:


  • jacob
    replied
    Originally posted by JustinTurdeau View Post

    That's nonsense. Adding syscalls for performance reasons is totally legitimate; especially for use cases as common as this one.

    If CPU vendors followed this mentality, we'd only have 10 instructions and everything would run dog slow.
    You realise that there is no performance difference whatsoever between ioctl() and a separate syscall, don't you? On a second thought, no, you probably don't.

    An the reason that we don't have dog-slow CPUs is precisely because they have advanced addressing modes and a CISC instruction set that allows generating compact and highly optimised code. RISC (one operation = one instruction) has proven useful for designing small, power efficient and cheap cores, but performance-wise it has comprehensively lost to CISC.

    Leave a comment:


  • JustinTurdeau
    replied
    Originally posted by jacob View Post
    Ideally only really new concepts should have new syscalls, new operations on existing object types should be implemented using existing generic interfaces.
    That's nonsense. Adding syscalls for performance reasons is totally legitimate; especially for use cases as common as this one. readv(), writev(), io_submit(), etc. also exist mostly for performance reasons.

    If CPU vendors followed this mentality, we'd only have 10 instructions and everything would run dog slow.
    Last edited by JustinTurdeau; 24 May 2020, 07:15 PM.

    Leave a comment:

Working...
X