Originally posted by chilinux
View Post
I am interested in the trade-off between gzip and lz4 when the kernel is transferred over TFTP. They seems to be assuming at least 5400 RPM HD transfer speeds which should be around 100 MB/s (800 Mbps)?
Then, if I remember, TFTP wasn't designed with speed in mind, only simplicity. Wasn't big problem at low networking speeds, where round-trip time isn't big deal - most time spent sending data packets rather than waiting for ACKs. But at >=Gigabit speeds things could be different - sending packets is quite fast, so waiting for ack can get a very sizeable % of time, since it 2 network stacks + data path on the way, it could be a lot of time compared to what data packet takes "in wire" at gigabit speed. If I remember TFTP wouldn't xmit next data until it gets ACK for previous one.
Most network cards provide a sad implementation of TFTP for PXE booting. Even if both the client and server are connected via gigabit NICs, PXE/TFTP does not get nearly the same transfer rate as a slow hard drive.
At which point gzip can eventually win due to smaller amount of data to transfer, especially if target got powerful CPU. However it depends on configuration details. Zstd could be interesting option, since it both compresses even better than gzip (thanks to larger dictionary, etc) and on most x86 HW it decompresses faster than gzip (and in worst case it could be about of gzip speed, on e.g. simple non-OoO ARM, etc - and even here it tends to be somewhat faster). Recent kernels got zstd self-decompression support if I remember - and it looks like some tradeoff to consider, especially if building own kernel, etc.
LZ4 is obviois winner on SSDs and so on. These are so fast read time could be small compared to decompression, especially if CPU is relatively slow and SSD is fast. However exact details depend on how these speeds compare in particular system. It isn't something that firmly set in stone, it depends on configuration details. I think I even seen some compression benchmark at least trying to calculate "winning" algo taking both transfer speed and decompression speed into account. Maybe it was lzturbo's turbobench (also on github). However benchmarks could only have synthetic coarse approximation as they don't really take into account other things (filesystem, fragmentation, resulting overhead, ... ).
So speaking for myself, when I get curious, I just try this, measure, then try that and measure, and eventually chose what works best in parcular situation. Though measuring small times could get rather complicated.
Leave a comment: