Announcement

Collapse
No announcement yet.

Linux Work Culminating On A "READFILE" Syscall For Reading Small Files Efficiently

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by jacob View Post
    You realise that there is no performance difference whatsoever between ioctl() and a separate syscall, don't you? On a second thought, no, you probably don't.
    Shoehorning 50 pseudo-syscalls into a single ioctl() syscall is a shitty hack and there definitely is a performance difference inside kernel space. Hence why actual kernel maintainers opt to do it the proper way.
    Last edited by tildearrow; 05-25-2020, 12:51 AM.

    Comment


    • #12
      Originally posted by jacob View Post

      Here is one I always wanted: be able to pass through a file descriptor. Say you want all input on fd X to be directly forwarded to output fd Y (by the kernel itself, without jumping into and back from user space). The splice() functionality kinda sorta lets you do that but it's clunky and doesn't work in all scenarios.
      http://man7.org/linux/man-pages/man2/sendfile.2.html
      You're welcome

      Comment


      • #13
        Originally posted by xpue View Post
        Perhaps also should consider making system calls to work with multiple files/whatever in one call, in batches.
        People are already talking about using/improving io_uring to do that on the LKML thread.
        Last edited by JustinTurdeau; 05-24-2020, 09:15 PM.

        Comment


        • #14
          Originally posted by markg85 View Post
          This is a wrapper over splice and doesn't work the way I want. You need to wait on in_fd and trigger the transfer for every batch of data (with unnecessary user-kernel trampoline every time), in_fd can't be a socket etc. It works for what it's intended to do, which is a file server, but it doesn't work for something like an on-demand tunnel manager. What I would like is to be able to "attach" an input file descriptor to an output one and have all data simply pass through (for example from socket to socket, or from local socket to inet socket etc.) with no further intervention.

          Comment


          • #15
            Originally posted by jacob View Post

            And the reason that we do have dog-slow CPUs is precisely because they have slow to decode addressing modes and a CISC instruction set that allows generating bloated code, that only Intel and AMD can implement at a reasonable speed. RISC (one operation = one instruction) has proven useful for anything Intel didn't manage to kill with Itanium.
            FTFY.

            Comment


            • #16
              Originally posted by xpue View Post
              Perhaps also should consider making system calls to work with multiple files/whatever in one call, in batches.
              AFAIK, that's already possible using io_uring. It's really flexible and powerful.

              After initially setting up the io_uring ring buffers at application startup, you'd have to prepare a bunch of "instructions" in the ring buffer to open e.g. 100 files, which happens in user space and is just writing to ring buffer memory. Then you tell the kernel "hey, I've prepared 100 new work items for your in the ring buffer. Please wake me up when you've finished processing all of them/some of them/I'll do other work in the meantime and check back with you later". This is just one syscall. Then when the kernel has finished opening the files, you receive the FDs on the other ring buffer. So now you can put 100 read "instructions" for those FDs on the ring buffer, and possibly add 100 close "instructions" too if you no longer need the FDs after reading. Then you call the kernel again to say "please work on those new 100/200 IO instructions now" and you can again choose to sleep until all or some of the work is done, or do other work while the kernel is busy reading (and possibly closing) the files.

              io_uring is really great.

              Apparently they want to add a way so you can do the same task in just one syscall, where the read and close "instructions" wouldn't have to explicitly state the FD they want to read/close, but instead you could say "use the FD that was opened in this 'instruction' I've put on the ring buffer previously". So that would mean you could do everything in one submit and just wait for the kernel to fill the buffers you've supplied.

              Comment


              • #17
                Originally posted by jacob View Post
                No aversion, but they shouldn't be introduced at whim.
                why? do you always add another switch case to same function with void pointer argument instead of introducing new function?
                Originally posted by jacob View Post
                Ideally only really new concepts should have new syscalls, new operations on existing object types should be implemented using existing generic interfaces.
                this is really crazy and unfounded idea. it has no benefits, but it has real costs: you are losing type information by sending garbage arguments (...). ioctl was created to allow device drivers which can't introduce system calls to still be able to provide some driver-specific functionality, i.e. they differ per fd, while subj is global
                Last edited by pal666; 05-25-2020, 04:57 AM.

                Comment


                • #18
                  Originally posted by xpue View Post
                  Perhaps also should consider making system calls to work with multiple files/whatever in one call, in batches.
                  Sounds like something eBPF would be well suited for. You could install a little program like open();read();close();return and call it with one context switch.

                  Comment


                  • #19
                    i wonder what is the size limit on the file. otherwise weird things will happen.

                    Comment


                    • #20
                      Originally posted by pal666 View Post
                      why? do you always add another switch case to same function with void pointer argument instead of introducing new function?
                      this is really crazy and unfounded idea. it has no benefits, but it has real costs: you are losing type information by sending garbage arguments (...). ioctl was created to allow device drivers which can't introduce system calls to still be able to provide some driver-specific functionality, i.e. they differ per fd, while subj is global
                      Not really. Type information is lost anyway at the kernel syscall interface (it doesn't know any type other than CPU register). Also there is an indirection in all cases: a new syscall is only a function from userland's prospective, at the kernel level it's nothing more than a number in a table.

                      You are right that ioctl semantics depend on the particular fd but that's the whole point, to implement ad-hoc functionality that only makes sense for certain kernel objects. When using BTRFS for example file and directory fd's support all sorts of specific ioctls (clone, reflink etc.) The precise ioctl mechanism is a POSIX relic that I'm not particularly fond of, it could and should be replaced by some better interface management model, but the idea of having all objects referenced by fds and each fd offering methods that make sense for that particular object is IMHO sound and better than introducing more and more syscalls that only make sense in some cases and for some objects.

                      Comment

                      Working...
                      X