Announcement

**oiaohm** · 28 December 2018, 06:48 PM

Originally posted by man-walking View Post

Oiaohm, when zswap stores pages on disk they are first uncompressed (read the docs), anyway what I wanted to point out is that swapfile compression active or not should be managed at file-content level by the linux memory management itself regardless the filesystem hosting the swapfile (whatever would be EXT4, BTRFS or such...), as the system is aware about which bounds/chunk size choose to both optimize compression and seek/read/write performance.
Usually RAM content should be highly compressible as any data is in flat format, on the worst case anyway LZ4 should do the job very well, almost on par as writing raw data on disk.

I forgot this prototype feature that was in compcache/zram/zswap experiments that was dropped. It was dropped for a very good reason. When you tight on memory and a page has been sent to swap you have quite a delay getting it back due to decompression and decompression end up requiring more memory. Add on decompression delay you now have a bigger set of races.

Zswap work out ok because the data has not gone to disc and the decompression time is less than the disc read.

A look at VDO, the new Linux compression layer

https://www.redhat.com/de/blog/look-vdo-new-linux-compression-layer

For a long time, we have used userland tools like gzip and rar for compression. Now with Virtual Data Optimizer (VDO), all required pieces for a transparent compression/deduplication layer are available in the just-released Red Hat Enterprise Linux 7.5. With this technology, it is possible to trade CPU/RAM resources for disk space. VDO becoming available is one of the results of Red Hat acquiring Permabit Technology Corporation in 2017. The code is available in the source RPMs, and upstream projects are getting established.

You can see the overhead of compressing at the block layer here. Compressed swap would have to be implemented at the zswap or equal area.

A large ram zswap removes most of you swap io as well. Resulting in most of the blocks being sent to swap being uncompressed even if you attempt to compress them.

You are right the usually ram content should be highly compressible but when you have zswap consuming all that and keeping 90+ percent of that in memory resulting in the disc based swapfile getting like 90 percent uncompressed blocks compression on swapfile comes highly problem.

The reason why the feature was dropped in early prototype has not changed. You need x block of memory for X application to go forwards. You read a compressed block you need somewhere to store the compressed block plus where you will decompress it to. So 2 blocks for storage required instead of 1.

Yes decompressing zswap ram to swap device in fact makes sense in most cases.

Yes if you want to try compressed swap at the file system/block level you can attempt swap partition on vdo its not particularly nice.
https://github.com/dm-vdo yes the source to vdo is here. Its a feature that is not merged mainline Linux yet.

Compressed storage is slower storage.

zswap compressed memory blocks in ram is slow than uncompressed ram but faster than sending those blocks to disc and getting them back..

Now compressing swap on disc equals issue with extra ram requirement due to compression vdo suffers from this so would btrfs or zfs compressing swapfile. Also extra overhead from the compression. Swap thrashing from disc is bad enough without it being even slower and more ram requiring due to compression.

30.2. System Requirements Red Hat Enterprise Linux 7 | Red Hat Customer Portal

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/vdo-qs-requirements

Access Red Hat’s knowledge, guidance, and support through your subscription.

Yes system requirement guidelines with compressed vdo. So now you are burning more ram just so you can compress volume.

**man-walking** · 30 December 2018, 08:42 PM

Originally posted by oiaohm View Post

When you tight on memory and a page has been sent to swap you have quite a delay getting it back due to decompression and decompression end up requiring more memory. Add on decompression delay you now have a bigger set of races.

Not with a fast (de)compressor like LZ4 (or ZSTD with tuned cases) especially with rotational disks, this disks are very slow at reading/writing data, any data amount saving of a multiple is a plus affecting more than the (de)compression time penalty, further more LZ4 decompression is lighting fast and just figure out ZSTD decompresses at LZO speeds regardless of the compression level of the data stream (!).

GitHub - facebook/zstd: Zstandard - Fast real-time compression algorithm

https://github.com/facebook/zstd#Benchmarks

Zstandard - Fast real-time compression algorithm. Contribute to facebook/zstd development by creating an account on GitHub.

Also, (de)compression needs a RAM buffer almost directly related with dictionary size multiplied for the parallel threads you intend to keep on, of course you won't use a dictionary size suitable for the common xz/7zip archives scenario for that realtime RAM/SWAP task, for ex. usually file systems use blocks around 64K (squashfs, BTRFS...)

Originally posted by oiaohm View Post

https://www.redhat.com/de/blog/look-...pression-layer
You can see the overhead of compressing at the block layer here. Compressed swap would have to be implemented at the zswap or equal area.

VDO uses deduplication, that's quite an intensive additional processing, that adds some non-linear complexity.

Originally posted by oiaohm View Post

A large ram zswap removes most of you swap io as well. Resulting in most of the blocks being sent to swap being uncompressed even if you attempt to compress them.
You are right the usually ram content should be highly compressible but when you have zswap consuming all that and keeping 90+ percent of that in memory resulting in the disc based swapfile getting like 90 percent uncompressed blocks compression on swapfile comes highly problem.

My interpretation is not strictly tied the zswap implementation (compress up to 50% of RAM or such before swap to disk... no), but instead to not *waste* the CPU/RAM cycles of having some of them already compressed and write them to disk almost as they are.
Unpacking them before writing it's so absurd, come on.

Originally posted by oiaohm View Post

The reason why the feature was dropped in early prototype has not changed. You need x block of memory for X application to go forwards. You read a compressed block you need somewhere to store the compressed block plus where you will decompress it to. So 2 blocks for storage required instead of 1.

Of course you would do this in n-blocks at time, not for the entire extents requested, so if implemented well you always need a fixed small fraction of (reserved) RAM for buffering

**oiaohm** · 02 January 2019, 07:53 PM

Originally posted by man-walking View Post

My interpretation is not strictly tied the zswap implementation (compress up to 50% of RAM or such before swap to disk... no), but instead to not *waste* the CPU/RAM cycles of having some of them already compressed and write them to disk almost as they are.
Unpacking them before writing it's so absurd, come on.

Lets look at the Linux kernel swap logic then you will work out it not absurd. You are out of memory system need to swap. How does system choose what to clear from memory to make space. That right items that are not in use and already in swap are high on the list because the system can auto clear them without requiring any IO at that point.

This results in a thrash problem when you truly get low on memory were items are being kicked out of memory then being pulled back in repeatably. So a lot of reading when you have bugger all ram to allocate.

Also the IO save of compression is lost in decompression costs increasing memory usage causing more pages to be swapped in and out. zswap is not without this overhead with zswap enabled more pages are sent into zswap due to the fact it skipping disc IO completely while zswap is storing zswap can come out slightly ahead in worst case where it has compressed then decompressed and sent to swap storage uncompressed. Please note only slightly.

If you send it compressed to swap hdd or ssd storage and having to decompress it every time page is kicked out and has to be recovered + IO to SDD/HDD you are in fact behind. This is just the way the numbers work out.

Originally posted by man-walking View Post

Also, (de)compression needs a RAM buffer almost directly related with dictionary size multiplied for the parallel threads you intend to keep on, of course you won't use a dictionary size suitable for the common xz/7zip archives scenario for that realtime RAM/SWAP task, for ex. usually file systems use blocks around 64K (squashfs, BTRFS...)

What is swap blocksize/page it 4kb. Yes the modern hDD sector size. This is also why swap on SSD is not exactly good for the health of SSD there is the SSD block problem. http://codecapsule.com/2014/02/12/co...-benchmarking/
Yes blocks that are the area that can be cleared in one hit on SDD is between 256kb to 4 megs. So 64K is really not ideal size for SSD either.

Swap is a lot more savage than your normal file system. This is why I use to love physical ram drives for extending memory with swap when motherboards ram slots were maxed out it has a block pattern that matches what swapfiles need.

SSD are design for storage not for swap.

Announcement

Btrfs Changes For Linux 4.21 Prepped With Swapfile Support, Logging Improvements

Comment

Comment

Comment