Does anyone know something about the state of lz4 transparent (de)compression in btrfs? I think with it's higher (de)compression speed lz4 should perform even better than lzo/zlib for transparent (de)compression in btrfs.
I have more Btrfs compression and general filesystem questions for the experts out there.
First for non-compressed filesystems. When an existing file is modified (something in the middle is changed or some data is prepended or appended like log files), is the entire file rewritten to disk or are the changes detected and only those parts are written to disk? It seems at least appending to a file won't cause the existing data to be rewritten.
For compressed files, how does this work? When reading the file it is decompressed and nothing is changed on the disk. When a file is modified, will the entire file be recompressed or only the modified parts? For example, when appending to a log file, will the entire file be recompressed or does Btrfs use independent compressed blocks and just appends the new blocks to the log file? If so, then if a file is changed "enough" will Btrfs try to recompress the entire file?
skinny metadata and larger leaf/node sizes
It would be interesting if we can see some differences between defaults vs 16k leaf size vs skinny metadata. Though I think 16k was already moved to default.
It depends how the program is written. Most programs cause full overwrites. I'd be very surprised if btrfs (or any other fs for the matter) expended the cpu cycles to detect changes - think 30GB video files and the lag it would cause.
Originally Posted by guido12
The data may end up in a new block on a SSD, but the logical address of that data would be in the same place the old data was. So your filesystem would have no way of knowing it was a copy and no way of getting back the old data.
Originally Posted by liam
Thanks, being software dependent makes sense. Do you know how the Btrfs compression algorithm works? The linked wiki says compression is skipped on files if the first compressed portion doesn't yield a smaller size. I couldn't find any info on how existing files are processed. So, if a program specifically only changes part of a file where on a non-compressed filesystem, only that part is modified, will Btrfs recompress the entire file and rewrite it? Then again, if most programs rewrite entire files then my worry is probably mostly useless and is just a what if scenario.
Originally Posted by curaga
I ask, because some years back, I read a Microsoft Technet or blog article about NTFS compression. It works on independent compression blocks. If a program is only modifying a part of a file and it is within a single compression block then only that block will be modified and recompressed. If it spans multiple blocks then the entire file or a chunk larger than the modified amount are recompressed and rewritten to disk. For the (few) programs that only modifies ranges of bytes, there could actually be more write accesses or, possibly, more data written to disk compared to NTFS compression being disabled.
i do not know how btrfs handles this, but this statement above is in generally wrong!
Originally Posted by curaga
while it is true that if you open a file in a common editor, modify it and then save it back the whole file is usually rewrtitten. That is because most editors read the whole file into the memory and then save it back.
though this is not the case usually for file modifying apps that are NOT editors (or editors able to handle big files, like video editors), like it is usually with log file writers. here it will be seeked whithin files. modifications only affect the block that is being modified and maybe the follow up block(s).
if btrfs is compressing blockwise (which would make the most sense imho) then btrfs would only need to recompress the affected blocks.
regarding performance relevant scenarios i wuld expect that only modified parts will be recompressed because editing file in an editor is not an action happening 100 times per second and thus not a performance relevant case.
Thanks for giving me the laugh! Good one!
Originally Posted by endman
You're right, DEFAULT_MKFS_LEAF_SIZE is finally 16k.
Originally Posted by renkin
And I agree, nodesize and skinny-metadata are the tunables I acually care about, everything else has sensible defaults or is a no-brainer (compression & noatime).