Announcement

Collapse
No announcement yet.

Linux Work Culminating On A "READFILE" Syscall For Reading Small Files Efficiently

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by jacob View Post
    Also there is an indirection in all cases: a new syscall is only a function from userland's prospective, at the kernel level it's nothing more than a number in a table.
    Right, but ioctls are essentially doing double dispatch. Once for the ioctl syscall itself and then again depending on the arguments. Why do double dispatch for no reason? Just to be contrarian?
    Last edited by JustinTurdeau; 05-25-2020, 04:24 PM.

    Comment


    • #22
      Originally posted by yoshi314 View Post
      i wonder what is the size limit on the file. otherwise weird things will happen.
      I was actually wondering something similar: What happens if another process changes the size while the first process (system call) tries to read. Can it be used as an exploit? Something to that effect...

      Comment


      • #23
        Originally posted by jacob View Post
        Why does it need a new syscall? Couldn't an ioctl call do it?
        An ioctl would require opening the file first, which defeats the purpose of putting open, read and close into a single syscall.

        ioctls are also implemented in drivers rather than in the VFS, so ignoring the reality that it is not possible to do this via an ioctl (possibly via a device node in /dev) and somehow doing it anyway would be a mess. You really want an ioctl when controlling a device, not when doing a VFS operation.

        Comment


        • #24
          Originally posted by d4ddi0 View Post
          If "small" is defined as under 4k, and it's handled in a backwards compatible way in libc (maybe make open() a noop, only actually "opening" if the file is large or opened for writing, and close() also potentially a noop), this could be an improvement for 90% of all file accesses.
          That is not happening. Software will need to explicitly take advantage of this.

          Comment


          • #25
            Originally posted by ryao View Post
            That is not happening. Software will need to explicitly take advantage of this.
            I think the idea of having readfile() in libc is key for enabling software to use it. If it had been there all along, then this would just be a transparent, drop-in optimization.

            However, at least adding it now lets you write cross-platform userspace code with it. That's going to really help adoption.

            Comment


            • #26
              Originally posted by yoshi314 View Post
              i wonder what is the size limit on the file. otherwise weird things will happen.
              Originally posted by _Alex_ View Post

              I was actually wondering something similar: What happens if another process changes the size while the first process (system call) tries to read. Can it be used as an exploit? Something to that effect...
              In what it would be different from a read(2) syscall ?
              • The size of the data read has an upper bound to the "count" parameter of the syscall, which should be the size of the buffer of the syscall. Of course is legal that the read() can returns less data than request. The same applies to readfile()
              • Even the classic read could suffer for a changing of the file during the reading. So the program should handles this case too.

              Comment


              • #27
                Originally posted by ryao View Post
                ioctls are also implemented in drivers rather than in the VFS, so ignoring the reality that it is not possible to do this via an ioctl (possibly via a device node in /dev) and somehow doing it anyway would be a mess. You really want an ioctl when controlling a device, not when doing a VFS operation.
                There are ioctl also for filesystem: for example look at fiemap ioctl, or at the XFS/BTRFS ones.. The former is generic (even tough the filesystem has to provide the fiemap() inode method) the latter are filesystem specific.
                Even tough nobody likes the ioctl, there are a lot of cases where an ioctl is the unique way to go...

                Comment


                • #28
                  Originally posted by JustinTurdeau View Post

                  Right, but ioctls are essentially doing double dispatch. Once for the ioctl syscall itself and then again depending on the arguments. Why do double dispatch for no reason? Just to be contrarian?
                  There are many (really MANY) more indirect dispatches involved during the processing of a single call. VFS, LSM, namespace resolution are but a few that pop up on my mind. Adding another one makes exactly zero measurable difference in processing performance but if it means having a cleaner and more coherent kernel interface I'm all for it.

                  Comment


                  • #29
                    Pretty excited about this; io_uring and readfile combined is going to significantly speedup anything that heavily reads from things like procfs. I'm currently in the early stages of implementing a stripped-down top clone in Rust for fun, and can't wait to use those technologies.

                    Comment


                    • #30
                      Originally posted by jacob View Post
                      There are many (really MANY) more indirect dispatches involved during the processing of a single call. VFS, LSM, namespace resolution are but a few that pop up on my mind. Adding another one makes exactly zero measurable difference in processing performance
                      This is just the "computers are fast, so who gives a shit" argument.

                      All those other things actually have a purpose and provide some kind of value, whereas pointlessly doing this the slow and ugly way because "meh, some other things are slow too" is not a very engineer-like mindset. I would expect that kind of thing from schmuck JavaScript "developers", not kernel hackers.
                      Last edited by JustinTurdeau; 05-25-2020, 06:10 PM.

                      Comment

                      Working...
                      X