Announcement

Collapse
No announcement yet.

FUSE Read/Write Passthrough Updated For Much Better File-System Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FUSE Read/Write Passthrough Updated For Much Better File-System Performance

    Phoronix: FUSE Read/Write Passthrough Updated For Much Better File-System Performance

    Of various criticisms around FUSE for implementing file-systems in user-space, one of the most prolific issues is around the performance generally being much lower than a proper file-system kernel driver. But with the FUSE passthrough functionality that continue to be worked on, there is the potential for much better FUSE file-system performance...

    http://www.phoronix.com/scan.php?pag...Passthrough-V6

  • #2
    I guess that'd be good for mergerfs!

    Comment


    • #3
      Wait so it only works if your FUSE relies on a native underlying filesystem? So no virtual FUSE equivalent where there isn't a file mapping?

      Comment


      • #4
        Originally posted by boeroboy View Post
        Wait so it only works if your FUSE relies on a native underlying filesystem? So no virtual FUSE equivalent where there isn't a file mapping?
        As I understand it, it's the FUSE equivalent of the sendfile system call that's heavily relied on by HTTP daemons to accelerate serving static files.

        The FUSE filesystem does whatever remapping and permissions checks it wants, and then asks the kernel to patch the resulting opened file descriptor directly to the corresponding file in the underlying kernel-level filesystem.

        (You could also think of it as being like DMA instead of PIO.)

        Comment


        • #5
          Originally posted by boeroboy View Post
          Wait so it only works if your FUSE relies on a native underlying filesystem?
          So no virtual FUSE equivalent where there isn't a file mapping?
          what kind of equivalent you expect? what should fuse do for reads/writes to virtual files?

          Comment


          • #6
            Originally posted by pal666 View Post
            what kind of equivalent you expect? what should fuse do for reads/writes to virtual files?
            Anything. Reading memory, fetching REST API calls, fetching any external data source, custom device reading, etc. In my case I created FUSE clients for some of our REST APIs using libCurl.

            https://github.com/jboero/hashifuse

            Comment


            • #7
              Originally posted by geearf View Post
              I guess that'd be good for mergerfs!
              Yup. I've been following this and another patch for some time and even considered picking it up myself but never got around to it.

              The one downside to this would be, as I understand the functionality, it'd break anything that relies on being able to interrupt the read/write flow should an error occur. IE... there is no "on error call back into the FUSE server". This would impact mergerfs' "moveonenospc" feature as well as a possible upcoming "read & open HA" mode that some have asked for. So passthrough would need to be optional but could be made to distinguish between files being opened only for reading, only for writing, or read/write (or any other info related to open/create which is when the decision is made.)

              Comment


              • #8
                Originally posted by boeroboy View Post
                Wait so it only works if your FUSE relies on a native underlying filesystem? So no virtual FUSE equivalent where there isn't a file mapping?
                What interface would / could be used if it weren't a file? There is already a way to interact if it's not a file mapping... all the existing messages in the FUSE protocol. read, write, etc. This feature is for the special case where you want to orchestrate things around the file but not the reads & writes themselves given the sensitivity to latency and throughput that has for clients.

                Comment


                • #9
                  Originally posted by boeroboy View Post
                  Anything. Reading memory, fetching REST API calls, fetching any external data source, custom device reading, etc. In my case I created FUSE clients for some of our REST APIs using libCurl.
                  I have used S3FS for a while for a production web app - it also is a FUSE implementation laid over an HTTP API.
                  In the end, we ditched it, and moved the app over to usage of a 'real' cloud NFS service, despite the increased cost. The problems we had with it where all related to a) performance and b) buggyness.
                  The interesting thing is that most bugs came from the complication associated with mitigating the perf issues by 'caching'.
                  Need to list the contents of a moderately-big directory? S3 api requires you to make one thousand http calls; latency will kill you unless you cache the results. Same for quickly opening one thousand small files. Cache them locally or suffer big time.
                  A naive implementation would cache both 'remote file' contents and their metadata in ram, but this has a nasty tendency to chew up all available memory until the apps running on the same server suddenly get swapped out or oom killed.
                  A smarter implementation would end up replicating most of the complex code from Squid.
                  I have a suspicion that caching remote-file-content to local disk and letting the kernel handle it from that point onward might be a worthy improvement...


                  Comment


                  • #10
                    Hmmhmm. this is only for merge/bindmount like things, right? It would be nice having sort of this functionality for things like ntfs-3g as well.

                    E.g., if the user opens a file, ntfs-3g can check whether a 1:1 passthrough is possible (ie. not fragmented, not compressed, not encrypted, not sparse etc.) - and then tell the fuse kernel part "you can read/write directly to /dev/sda1 offset 10240000 as long you don't read/write past 10480000" (this way passthrough can be used at least for the first continuous part of the file). In the next iteration the user space part could give the kernel part the runlist of the fragmented parts of the file. The kernel could delay requesting the runlists until a few accesses to the opened files to save resources.
                    Last edited by mifritscher; 08-13-2020, 07:14 PM.

                    Comment

                    Working...
                    X