Announcement

Collapse
No announcement yet.

Wasmer 3.0 Released As The Latest "Universal WebAssembly Runtime"

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by NobodyXu View Post

    Turns out that docker is capable of pulling this off

    They simply take the read-only image, create a subvolume of it that is writable, then mount that into the container.
    ...
    So if we want to merge this in a btrfs/zfs subvolume, I guess we only need to handle the whiteouts and opaque directories specially and anything else can be simply copied.
    You missed the bit about merging multiple RO fs images. Unless again you mean to extract all of them as your way of merging them. So then you'd be keeping protected copies of the extracted FS image contents, using snapshots or subvolumes or just putting them in their own restricted-access folders. And then to avoid using OverlayFS/UnionFS/Aufs, in order to present a single unified RW filesystem to the app in its container, you would copy them all into a single combined writeable tree, using reflinks on xfs or F2FS, or hardlinks plus ACL and LD_PRELOAD trickery on ext4, and native dedupe on FS that support it.

    I wonder what the core usage would be like on compressed deduped BTRFS when 6 apps that use the 1GB GNOME overlay get that updated... will it decompress it, recompress it, find the duplicate compressed blocks, avoid the writes, so the only penalty vs status quo is decompressing 1TB and compressing 7TB for the copy operations at install time?

    Anyhoo... that's doable. The tradeoff at runtime is between the extra disk and RAM usage for the metadata of the per-app combined trees, vs the performance/complexity cost for a Union filesystem.

    It's a bit of a mess when you go to update the app or one of the dependencies though, as you'll need to extract only the RW tree changes, set them aside, tear down the combined tree, build it back up, then re-apply the previously set aside changes. Or you could keep a database of where each file came from, and manage them individually within the writeable tree.

    There's various optimisations that would be possible with each FS, ie with ext4 hardlinks you could update the extracted inodes, and all copies would be updated in place like magic... which is normally a problem but in this case it would be awesome lol. Reflinks would end up becoming COW copies if you tried that, but once you replaced all the child reflinks the old COW would get orphaned and garbage collected, and of course btrfs/zfs with native dedupe.

    So... just write a tool to manage the app runtime/chroot/container trees in place. Make sure to take advantage of the benefits of each supported filesystem, and I'm sure it would get adopted. There would be meaningful runtime performance benefits for file metadata-intensive applications so long as RAM wasn't in short supply.

    Or just keep using a Union filesystem to do that heavy lifting. Literally job done.
    Last edited by linuxgeex; 28 December 2022, 09:46 AM.

    Comment


    • #22
      Originally posted by linuxgeex View Post
      You missed the bit about merging multiple RO fs images. Unless again you mean to extract all of them as your way of merging them. So then you'd be keeping protected copies of the extracted FS image contents, using snapshots or subvolumes or just putting them in their own restricted-access folders. And then to avoid using OverlayFS/UnionFS/Aufs, in order to present a single unified RW filesystem to the app in its container, you would copy them all into a single combined writeable tree, using reflinks on xfs or F2FS, or hardlinks plus ACL and LD_PRELOAD trickery on ext4, and native dedupe on FS that support it.
      I'm not so familiar with snap/flatpak, but for docker/overlayfs, it's certainly doable since in that model, images are layered in a tree model: Every layer has a parent (except for the root) and only contains modification to the parent.

      Originally posted by linuxgeex View Post
      I wonder what the core usage would be like on compressed deduped BTRFS when 6 apps that use the 1GB GNOME overlay get that updated... will it decompress it, recompress it, find the duplicate compressed blocks, avoid the writes, so the only penalty vs status quo is decompressing 1TB and compressing 7TB for the copy operations at install time?
      Well, Btrfs only supports offline deduplication, so it will start dedup periodically when the system is idle.
      Zfs supports online duplication, but that consumes a lot of memory.

      Compression can indeed help reduce I/O as it is done before writing to disk.
      I remember reading somewhere on phoronix that the maximum size for one compressed block is 128K, so depending on the compression algorithm, it can store quire some data.

      Using zstd:19 with force compression (zstd internally checks data for whether compression is feasible and it's more superior than btrfs's algorithm) can work quite effectively and since decompression speed of zstd is mostly independent of the compression level​, you can set compression level to highest (19) for filesystem that performs read far more often than write.

      Originally posted by linuxgeex View Post
      Anyhoo... that's doable. The tradeoff at runtime is between the extra disk and RAM usage for the metadata of the per-app combined trees, vs the performance/complexity cost for a Union filesystem.
      In fact, union filesystem like overlayfs and unionfs can occupy more space than btrfs/zfs.

      When modifying files, it will first be copied onto upper layer, which is writeable, then performs the modification, where as in btrfs/zfs, only the block modified will be copied.

      Originally posted by linuxgeex View Post
      It's a bit of a mess when you go to update the app or one of the dependencies though, as you'll need to extract only the RW tree changes, set them aside, tear down the combined tree, build it back up, then re-apply the previously set aside changes. Or you could keep a database of where each file came from, and manage them individually within the writeable tree.
      Using btrfs-send, that's doable without having to use a database, but it does affect performance since it is basically rewritten.

      IMHO container should be disposable and any data needs to persist should be put into a volume that mounted into the container.

      Originally posted by linuxgeex View Post
      There's various optimisations that would be possible with each FS, ie with ext4 hardlinks you could update the extracted inodes, and all copies would be updated in place like magic... which is normally a problem but in this case it would be awesome lol. Reflinks would end up becoming COW copies if you tried that, but once you replaced all the child reflinks the old COW would get orphaned and garbage collected, and of course btrfs/zfs with native dedupe.

      So... just write a tool to manage the app runtime/chroot/container trees in place. Make sure to take advantage of the benefits of each supported filesystem, and I'm sure it would get adopted. There would be meaningful runtime performance benefits for file metadata-intensive applications so long as RAM wasn't in short supply.

      Or just keep using a Union filesystem to do that heavy lifting. Literally job done.
      That's what I am trying to say: Docker already uses btrfs.
      While it defaults to overlayfs (union fs), it supports btrfs as an alternative.
      That's why I say this is definitely doable.

      Comment


      • #23
        Originally posted by NobodyXu View Post

        I'm not so familiar with snap/flatpak, but for docker/overlayfs, it's certainly doable since in that model, images are layered in a tree model: Every layer has a parent (except for the root) and only contains modification to the parent.
        ...
        IMHO container should be disposable and any data needs to persist should be put into a volume that mounted into the container.
        ...
        While it defaults to overlayfs (union fs), it supports btrfs as an alternative.
        That's why I say this is definitely doable.
        I agree it's doable... I've just been concerned with performance and features at install/update/remove time.

        Your "IMHO" objective... that's the very point I've been trying to make. How would you realise it with BTRFS or zfs and the tools they provide, in real time, without a Union FS as the arbiter of segregation?

        ie given 2 source trees /core and /app, and a RW overlay for the app /state, how would you combine them with BTRFS to achieve a single system image "/" to run the app in a chroot/container, whereby all the state changes end up in the /state folder so that for example the /state can be rsynced to another host for live migration without ever touching a single file from /core and /app? Would that need to rely on btrfs-send? Zfs has a way to sync subvolume deltas but it's a binary representation of FS blocks and very dependent upon the parent(s) at both ends of the link to be in sync, so it has limited usability... similar to taking an LVM snapshot and syncing that between hosts.

        PS I don't mean to move the goalposts - I'm just following your Docker context because you seem more versed with that. The ability to move the state easily between hosts also exists in Snapd, which I mentioned before in the context of a developer using that capability for debugging / support.

        And BTW thanks this is turning out to be one of the most interesting off-topic convos I've had on Phoronix lol.
        Last edited by linuxgeex; 29 December 2022, 03:12 AM.

        Comment


        • #24
          Originally posted by linuxgeex View Post

          I agree it's doable... I've just been concerned with performance and features at install/update/remove time.

          Your "IMHO" objective... that's the very point I've been trying to make. How would you realise it with BTRFS or zfs and the tools they provide, in real time, without a Union FS as the arbiter of segregation?
          I just check the snapd documentation and it seems that the image is built in a way similar to docker: You have a base image providing different distros with different pre-installed libraries and the actual image which is built on top of it.

          Then merging them is going to be the same as how docker does it and it only needs to be done once when importing the image.​

          Originally posted by linuxgeex View Post
          ie given 2 source trees /core and /app, and a RW overlay for the app /state, how would you combine them with BTRFS to achieve a single system image "/" to run the app in a chroot/container, whereby all the state changes end up in the /state folder so that for example the /state can be rsynced to another host for live migration without ever touching a single file from /core and /app? Would that need to rely on btrfs-send? Zfs has a way to sync subvolume deltas but it's a binary representation of FS blocks and very dependent upon the parent(s) at both ends of the link to be in sync, so it has limited usability... similar to taking an LVM snapshot and syncing that between hosts.

          PS I don't mean to move the goalposts - I'm just following your Docker context because you seem more versed with that. The ability to move the state easily between hosts also exists in Snapd, which I mentioned before in the context of a developer using that capability for debugging / support.
          I assume /core and /app refers to part of the app image, where /core is the base and /app is the layer created by the application.
          That I think can be merged in the same way as docker.

          For the app data /state, you can just use a bind-mount, which is also the same way used by docker.
          In docker, you typically store these data in a volume, which is also bind mounted into the container.
          The volume then can use a different filesystem other than btrfs/zfs for they might not give the best performance for these apps given that it might be write-heavy.

          Originally posted by linuxgeex View Post
          And BTW thanks this is turning out to be one of the most interesting off-topic convos I've had on Phoronix lol.
          You are welcome.

          Comment

          Working...
          X