Announcement

**linuxgeex** · 06 December 2022, 03:55 PM

Originally posted by NobodyXu View Post

Not familiar with snapd/flatpak since I am currently using MacOS.

If you're familiar with chroot-based containers, snapd/flatpak basically roll all the app dependencies into containers, and then use overlay mounts to share the dependencies between installed apps. So there's a chance that 3 apps depending on GNOME will share the GNOME overlay. Sadly in my experience every app depends a different GNOME release, for no clear reason, resulting in having 7 copies of GNOME installed to support 3 apps. One for the host system, 2 for each app. Why 2 for each app? Because one app wants gnome-18, one wants gnome-20, and one wants gnome-22, snap regularly installs refreshed versions of the overlays and keeps the most recent 2 by default. To support 6 desktop apps I have more installed in /snap than I do for the entire well-appointed host OS. Good thing storage is so cheap these days.

**NobodyXu** · 06 December 2022, 07:20 PM

Originally posted by linuxgeex View Post

If you're familiar with chroot-based containers, snapd/flatpak basically roll all the app dependencies into containers, and then use overlay mounts to share the dependencies between installed apps. So there's a chance that 3 apps depending on GNOME will share the GNOME overlay. Sadly in my experience every app depends a different GNOME release, for no clear reason, resulting in having 7 copies of GNOME installed to support 3 apps. One for the host system, 2 for each app. Why 2 for each app? Because one app wants gnome-18, one wants gnome-20, and one wants gnome-22, snap regularly installs refreshed versions of the overlays and keeps the most recent 2 by default. To support 6 desktop apps I have more installed in /snap than I do for the entire well-appointed host OS. Good thing storage is so cheap these days.

I'm familiar with docker and roughtly knows its underlying tech (namespace, cgroup, overlayfs2).

For this particular cases where each app use different gnome versions, perhaps this can be migrated by using btrfs.
Btrfs supports offline deduplication using a software called bees https://github.com/Zygo/bees
It also supports transparent compression using lzo/zstd, which is very effective at reducing space usage especially if you have multiple versions of a software.

**linuxgeex** · 09 December 2022, 05:10 PM

Originally posted by NobodyXu View Post

I'm familiar with docker and roughtly knows its underlying tech (namespace, cgroup, overlayfs2).

For this particular cases where each app use different gnome versions, perhaps this can be migrated by using btrfs.
Btrfs supports offline deduplication using a software called bees https://github.com/Zygo/bees
It also supports transparent compression using lzo/zstd, which is very effective at reducing space usage especially if you have multiple versions of a software.

That's a kind thought, but BTRFS can't mitigate duplicate blocks in compressed storage overlays, which obfuscate them.

Storage isn't an issue for me. Snap is using less than 1%. RAM usage, in particular page cache usage, is the real problem. When I see 14GB of page cache to run 4 desktop apps, something is very very wrong.

**NobodyXu** · 09 December 2022, 06:34 PM

Originally posted by linuxgeex View Post

That's a kind thought, but BTRFS can't mitigate duplicate blocks in compressed storage overlays, which obfuscate them.

I remember that they can do this since basically compares decompressed blocks.

Originally posted by linuxgeex View Post

Storage isn't an issue for me. Snap is using less than 1%. RAM usage, in particular page cache usage, is the real problem. When I see 14GB of page cache to run 4 desktop apps, something is very very wrong.

I think Btrfs dedup will help here.

Once the blocks are deduped, the block cache for the fs will also be deduped.

So instead of loading the same data twice in different blocks, it gets loaded only once.

**linuxgeex** · 19 December 2022, 02:44 PM

Originally posted by NobodyXu View Post

I remember that they can do this since basically compares decompressed blocks.

I think Btrfs dedup will help here.

Once the blocks are deduped, the block cache for the fs will also be deduped.

So instead of loading the same data twice in different blocks, it gets loaded only once.

You're sadly misunderstanding the layer at which compression is happening, which is why I clarified that the data is obfuscated from the filesystem. It's a lot more productive if you make a good faith effort to see the other side of a conversation.

Here's a thought exercise for you. Imagine a folder of 100 images. Make a zip of that folder. Now remove the first image from the folder and make a second zip file. There's 99% data overlap of the data within the 2 zips. However there's not a single block of the data on disk that's the same. So BTRFS would not de-dupe it in any way.

The Snap overlays are compressed filesystems. They are like the zip files. Their compressed data as represented on disk is not identical, so it cannot be de-duplicated. Not unless BTRFS is going to attempt to decompress and match blocks of files within those compressed filesystems. That's not impossible - WinRAR for example does this if you attempt to compress multiple ISOs, zips, and a variety of other compressed file formats. But BTRFS today doesn't support such snooping. The CPU and disk bandwidth costs would be prohibitive. Maybe BTRFS will do that in the year 2222...

**NobodyXu** · 19 December 2022, 06:51 PM

linuxgeex Thanks for the explanation.

I wonder does snapd/flatpak has special support for Btrfs where they can simply store the overlays as a Btrfs subvolume and then use Btrfs subvolume to create a new writeable overlay when launching a new application?

I know that docker supports this for Btrfs and Zfs, essentially replacing their overlayfs2 driver with btrfs/zfs driver to take advantage of the cow filesystem features.

**linuxgeex** · 21 December 2022, 02:30 AM

Originally posted by NobodyXu View Post

linuxgeex Thanks for the explanation.

I wonder does snapd/flatpak has special support for Btrfs where they can simply store the overlays as a Btrfs subvolume and then use Btrfs subvolume to create a new writeable overlay when launching a new application?

I know that docker supports this for Btrfs and Zfs, essentially replacing their overlayfs2 driver with btrfs/zfs driver to take advantage of the cow filesystem features.

No need for subvol - bind mounts are the compatible, KISS, solution.

I suppose one could manually decompress the images. There still wouldn't be many shared blocks thanks to block alignment issues. One would need to rebuild the overlay filesystems with block size matching BTRFS's dedupe block size to avoid the alignment problems. SquashFS can be forced to disable tail packing (normally the compressed blocks are written back to back instead of block aligned), use a 16KB block size (BTRFS dedupe default.)

Compression... different Zlibs could break binary stream identity. On the plus side, you could also get both compression (SquashFS) and dedupe (reflink) with xfs:

Code:

fdupes -r . | duperemove --fdupes
duperemove -hdr --hashfile=/tmp/test.hash --dedupe-options=same,block

I have a hard time seeing Canonical/Snapd adopt this since EXT4 is popular. Anyone aware of a scheduled arrival of reflink for EXT4?

Flathub is more community-driven so maybe more luck with the concept there.

**NobodyXu** · 21 December 2022, 02:53 AM

Originally posted by linuxgeex View Post

No need for subvol - bind mounts are the compatible, KISS, solution.

Isn't the images read-only while the actual running container writeable so that you can re-create the container multiple times?

Originally posted by linuxgeex View Post

I suppose one could manually decompress the images. There still wouldn't be many shared blocks thanks to block alignment issues. One would need to rebuild the overlay filesystems with block size matching BTRFS's dedupe block size to avoid the alignment problems. SquashFS can be forced to disable tail packing (normally the compressed blocks are written back to back instead of block aligned), use a 16KB block size (BTRFS dedupe default.)

That's indeed a problem, though I am thinking of supporting btrfs/zfs natively instead of using a loop device.

Originally posted by linuxgeex View Post

Compression... different Zlibs could break binary stream identity. On the plus side, you could also get both compression (SquashFS) and dedupe (reflink) with xfs:

Code:

fdupes -r . | duperemove --fdupes
duperemove -hdr --hashfile=/tmp/test.hash --dedupe-options=same,block

I forgot that xfs also supports reflink, which can be used for dedup, that's also a possible solution.

Originally posted by linuxgeex View Post

I have a hard time seeing Canonical/Snapd adopt this since EXT4 is popular. Anyone aware of a scheduled arrival of reflink for EXT4?

Flathub is more community-driven so maybe more luck with the concept there.

I think they can simply re-use runc/crun, the runtime for docker/kubernates, which already supports btrfs/zfs...

**linuxgeex** · 27 December 2022, 09:38 AM

Originally posted by NobodyXu View Post

Isn't the images read-only while the actual running container writeable so that you can re-create the container multiple times?

Yes. It's achieved with UnionFS. They mount the app, then the dependency overlays, and finally a RW folder on top of it.

You're suggesting to extract the compressed filesystem images into native BTRFS/ZFS folders, and then take per-app-context writeable snapshots of those folders so they don't affect each other ... two problems...

First, those RW snapshots still need to be merged into a single standard POSIX filesystem heirarchy. Neither BTRFS nor ZFS provide a way to do that, at least not that I'm aware of. So you'd still end up using UnionFS to merge them.

Second, when you want to make an archive of only the RW state changes (done numerously ie on app/overlay update/removal so there's versioned restore points) that state would be mixed into those multiple RW snapshots. Ultimately you'd use UnionFS to provide a single clean RW overlay as well, just like is done on every other parent FS, and sadly lose the seemingly helpful RW snapshots.

While I recognise that snapshotting the RW snapshots recursively appears to provide similar capability, there's drawbacks. ie one of the benefits of Snapd for developers is that if a user has a problem with an app, the user can save state and send it to the developer. How do you extract only the changed files from the app's merged FS image? Not impossible granted, but neither is it quick and easy. Snapshots also aren't free, and it's easy to ignore that. There are performance and storage costs, and in this context they don't really compare that favourably to a tarball.

**NobodyXu** · 28 December 2022, 06:37 AM

Originally posted by linuxgeex View Post

Yes. It's achieved with UnionFS. They mount the app, then the dependency overlays, and finally a RW folder on top of it.

You're suggesting to extract the compressed filesystem images into native BTRFS/ZFS folders, and then take per-app-context writeable snapshots of those folders so they don't affect each other ... two problems...

First, those RW snapshots still need to be merged into a single standard POSIX filesystem heirarchy. Neither BTRFS nor ZFS provide a way to do that, at least not that I'm aware of. So you'd still end up using UnionFS to merge them.

Turns out that docker is capable of pulling this off

Use the BTRFS storage driver

https://docs.docker.com/storage/storagedriver/btrfs-driver/#how-the-btrfs-storage-driver-works

Learn how to optimize your use of Btrfs driver.

Site not found · GitHub Pages

https://gdevillele.github.io/engine/userguide/storagedriver/btrfs-driver/

They simply take the read-only image, create a subvolume of it that is writable, then mount that into the container.

Regarding unionfs, what I know it is similar to overlayfs, in that you use one or multiple read-only base images (lower layers) and one writeable upper layer to form one overlayfs that is mounted into the container.

If you deleted any file from the lower layers, it will write a special file to upper layer called whiteouts and opaque directories for removing directories.
For changes to the files from lower layers, it will be copied from lower layer to upper layer and then modified.
Anything else acts just like regular fs operations.

So if we want to merge this in a btrfs/zfs subvolume, I guess we only need to handle the whiteouts and opaque directories specially and anything else can be simply copied.

This will be more complicated if OVERLAY_FS_REDIRECT_DIR or CONFIG_OVERLAY_FS_METACOPY is enabled, since it will only partially copy the metadata, but still can be dealed with.
I don't know how widely use it is though, I suppose docker might not use them for portability.

Originally posted by linuxgeex View Post

Second, when you want to make an archive of only the RW state changes (done numerously ie on app/overlay update/removal so there's versioned restore points) that state would be mixed into those multiple RW snapshots. Ultimately you'd use UnionFS to provide a single clean RW overlay as well, just like is done on every other parent FS, and sadly lose the seemingly helpful RW snapshots.

While I recognise that snapshotting the RW snapshots recursively appears to provide similar capability, there's drawbacks. ie one of the benefits of Snapd for developers is that if a user has a problem with an app, the user can save state and send it to the developer. How do you extract only the changed files from the app's merged FS image? Not impossible granted, but neither is it quick and easy. Snapshots also aren't free, and it's easy to ignore that. There are performance and storage costs, and in this context they don't really compare that favourably to a tarball.

That's indeed harder, need to use `sudo btrfs subvolume send -p /path/to/base_image /path/to/container/image` which requires root.
I won't say that i hard if snapd has built-in support for this like docker.

With btrfs v2 https://www.phoronix.com/news/Btrfs-...-v2-Linux-5.20 , it can include compressed data into the send stream to reduce stream size and speed up procession time on send and receive.

For container that only modifies part of the file instead of the whole file, this can be very beneficial since btrfs send stream would not incl, de the whole file but only the part that changes, unlike overlayfs/unionfs, though the compression is applied per block instead of the whole file, the stream might not be as small as the tar stream.

Announcement

Wasmer 3.0 Released As The Latest "Universal WebAssembly Runtime"

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment