FUSE Read/Write Passthrough Updated For Much Better File-System Performance

mifritscher replied

17 August 2020, 06:00 PM
This could use every fuse driver which has a block device behind and where a 1:1 mapping is possible. ntfs, exfat etc. btrfs/zfs would be possible as well - at least ro. (write by a user space call for allocating, then the to be written data can be kept completely in kernel space)
Fragmented files can be done in V2.0, also sparse files (excluding writes to sparse areas which aren't allocated yet). But at least accessing the first fragment could be done in V1.0 quite easily.
Likes 1
Leave a comment:
starshipeleven replied

17 August 2020, 05:37 PM
Originally posted by mifritscher View Post

Hmmhmm. this is only for merge/bindmount like things, right? It would be nice having sort of this functionality for things like ntfs-3g as well.

E.g., if the user opens a file, ntfs-3g can check whether a 1:1 passthrough is possible (ie. not fragmented, not compressed, not encrypted, not sparse etc.) - and then tell the fuse kernel part "you can read/write directly to /dev/sda1 offset 10240000 as long you don't read/write past 10480000" (this way passthrough can be used at least for the first continuous part of the file). In the next iteration the user space part could give the kernel part the runlist of the fragmented parts of the file. The kernel could delay requesting the runlists until a few accesses to the opened files to save resources.

that would be a ntfs-3g-specific thing

Also, let's be honest, how many files in NTFS are "not fragmented, not compressed, not encrypted, not sparse"?
Leave a comment:
miquels replied

15 August 2020, 02:13 PM
Originally posted by gggeek View Post

Need to list the contents of a moderately-big directory? S3 api requires you to make one thousand http calls; latency will kill you unless you cache the results.

Just wondering ... what if you use HTTP/2 and just send out all http calls in parallel instead of sequentially?
Leave a comment:
boeroboy replied

14 August 2020, 04:32 AM
Originally posted by gggeek View Post

The interesting thing is that most bugs came from the complication associated with mitigating the perf issues by 'caching'.
Need to list the contents of a moderately-big directory? S3 api requires you to make one thousand http calls; latency will kill you unless you cache the results. Same for quickly opening one thousand small files. Cache them locally or suffer big time.

Exactly. I ended up caching metadata in some of mine as well. Plus you need big reads and big writes otherwise you'll only get 4k blocks at a time which kills performance on a good day.
Leave a comment:
mifritscher replied

13 August 2020, 07:12 PM
Hmmhmm. this is only for merge/bindmount like things, right? It would be nice having sort of this functionality for things like ntfs-3g as well.

E.g., if the user opens a file, ntfs-3g can check whether a 1:1 passthrough is possible (ie. not fragmented, not compressed, not encrypted, not sparse etc.) - and then tell the fuse kernel part "you can read/write directly to /dev/sda1 offset 10240000 as long you don't read/write past 10480000" (this way passthrough can be used at least for the first continuous part of the file). In the next iteration the user space part could give the kernel part the runlist of the fragmented parts of the file. The kernel could delay requesting the runlists until a few accesses to the opened files to save resources.

Last edited by mifritscher; 13 August 2020, 07:14 PM.
Leave a comment:
gggeek replied

13 August 2020, 02:42 PM
Originally posted by boeroboy View Post

Anything. Reading memory, fetching REST API calls, fetching any external data source, custom device reading, etc. In my case I created FUSE clients for some of our REST APIs using libCurl.

I have used S3FS for a while for a production web app - it also is a FUSE implementation laid over an HTTP API.
In the end, we ditched it, and moved the app over to usage of a 'real' cloud NFS service, despite the increased cost. The problems we had with it where all related to a) performance and b) buggyness.
The interesting thing is that most bugs came from the complication associated with mitigating the perf issues by 'caching'.
Need to list the contents of a moderately-big directory? S3 api requires you to make one thousand http calls; latency will kill you unless you cache the results. Same for quickly opening one thousand small files. Cache them locally or suffer big time.
A naive implementation would cache both 'remote file' contents and their metadata in ram, but this has a nasty tendency to chew up all available memory until the apps running on the same server suddenly get swapped out or oom killed.
A smarter implementation would end up replicating most of the complex code from Squid.
I have a suspicion that caching remote-file-content to local disk and letting the kernel handle it from that point onward might be a worthy improvement...
Likes 1
Leave a comment:
trapexit replied

13 August 2020, 12:39 PM
Originally posted by boeroboy View Post

Wait so it only works if your FUSE relies on a native underlying filesystem? So no virtual FUSE equivalent where there isn't a file mapping?

What interface would / could be used if it weren't a file? There is already a way to interact if it's not a file mapping... all the existing messages in the FUSE protocol. read, write, etc. This feature is for the special case where you want to orchestrate things around the file but not the reads & writes themselves given the sensitivity to latency and throughput that has for clients.
Leave a comment:
trapexit replied

13 August 2020, 12:22 PM
Originally posted by geearf View Post

I guess that'd be good for mergerfs!

Yup. I've been following this and another patch for some time and even considered picking it up myself but never got around to it.

The one downside to this would be, as I understand the functionality, it'd break anything that relies on being able to interrupt the read/write flow should an error occur. IE... there is no "on error call back into the FUSE server". This would impact mergerfs' "moveonenospc" feature as well as a possible upcoming "read & open HA" mode that some have asked for. So passthrough would need to be optional but could be made to distinguish between files being opened only for reading, only for writing, or read/write (or any other info related to open/create which is when the decision is made.)
Leave a comment:
boeroboy replied

13 August 2020, 11:48 AM
Originally posted by pal666 View Post

what kind of equivalent you expect? what should fuse do for reads/writes to virtual files?

Anything. Reading memory, fetching REST API calls, fetching any external data source, custom device reading, etc. In my case I created FUSE clients for some of our REST APIs using libCurl.

GitHub - jboero/hashifuse: Experimental FUSE clients for Hashicorp REST APIs.

https://github.com/jboero/hashifuse

Experimental FUSE clients for Hashicorp REST APIs. - jboero/hashifuse
Likes 1
Leave a comment:
pal666 replied

13 August 2020, 11:33 AM
Originally posted by boeroboy View Post

Wait so it only works if your FUSE relies on a native underlying filesystem?
So no virtual FUSE equivalent where there isn't a file mapping?

what kind of equivalent you expect? what should fuse do for reads/writes to virtual files?
Leave a comment:

Announcement

FUSE Read/Write Passthrough Updated For Much Better File-System Performance

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: