Announcement
Collapse
No announcement yet.
A ZSTD-Compressed Linux Kernel Could Be Up Next
Collapse
X
-
Originally posted by sdack View PostIt's going to be another option barely anyone will choose. An XZ compressed kernel decompresses within milliseconds on modern x86.
They should drop BZIP2 from it as it's likely the least used compression algorithm throughout the kernel anyway. LZO could have been dropped with LZ4, because there is barely any difference between the two and LZ4 being the faster at decompression. LZ4 itself can be dropped with ZSTD and so could GZIP. Then it make sense to have ZSTD. It's only bloating the kernel options now. The least they should do is to declare some of the options as obsolete.
At best they could hide those that won't make sense anymore on x86 on the arch kconfig, so they only show on ARM/MIPS/etc.
Comment
-
Originally posted by microcode View PostDoes anyone compress the kernel with zoplfi? Seems like you could get at least a few extra percent with maybe a few hundred iterations.
As for a way to test decompression speed, it's kinda two numbers that matter isn't it?
1) How much time is spent reading the kernel from the non-volatile storage?
2) How much time is spent decompressing the kernel?
Add these two together and you get a total score on one machine, use them independently to scale your results to machines with different CPU speeds or different storage speeds.
XZ compression is still the best compressor in terms of compression ratio found with the kernel. When you set the XZ_OPT variable with:
export XZ_OPT="-9e --x86 --lzma2=dict=32M,nice=273,depth=512,pb=0,lc=4"
just before you compile and install the kernel then you'll see a smaller kernel size. You might also have to tell the initramfs-tools to use xz (see /etc/initramfs-tools/initramfs.conf) before these will use it, too, because by default may these still only use gzip.
A quick comparison between XZ and ZSTD for a random kernel:"ZSTD advanced" stands for "zstd --ultra -22 --zstd=tlen=990" and "XZ advanced" stands for the options shown above.XZ default 4606184 79% XZ advanced 4369132 75% ZSTD default 5816231 100% ZSTD advanced 4903743 84%
The trend in compression goes towards neural nets where we can see the biggest gains. Just for fun am I now trying to compress 'vmlinux.bin' with cmix. If it ever completes (currently at 13% completion) will I make an update and post some numbers.
- Likes 2
Comment
-
Originally posted by sdack View PostAn XZ compressed kernel decompresses within milliseconds on modern x86.
LZ4 itself can be dropped with ZSTD and so could GZIP. Then it make sense to have ZSTD. It's only bloating the kernel options now. The least they should do is to declare some of the options as obsolete.
I thought a while about the use of LZ4, but the speed gain is really marginal. First of all does the speed become irrelevant the faster a CPU is. And for a small, embedded device can the size of a kernel image be of more importance than it's boot-up speed. A device then doesn't have to boot every time you use it, but can also make use of hibernate and suspend functions to avoid the entire boot process.
Comment
-
Originally posted by sdack View PostI thought a while about the use of LZ4, but the speed gain is really marginal. First of all does the speed become irrelevant the faster a CPU is. And for a small, embedded device can the size of a kernel image be of more importance than it's boot-up speed. A device then doesn't have to boot every time you use it, but can also make use of hibernate and suspend functions to avoid the entire boot process. So this whole argument around LZ4 for the kernel image seems to be more theoretical than it's actually practical.Originally posted by starshipeleven View PostZSTD compresses worse than XZ at higher levels, and for kernel images in embedded you want best compression, period. Even a total shit CPU is going to decompress it in a few seconds, and if it gives you more space for fitting the firmware, then it's worth it.
Aircraft interfaces have to be fully back on line in 1 second and you may not be using the most high performing cpu either. So a decompress that takes a few seconds could have exceed the max allowed boot time. If you max allowed boot time in some of these markets the regulation means your product cannot be sold for these saving firmware space at the cost of boot speed is stupid.
Really you have 4 classes of issues in embedded.
1) Storage sensitive. So smaller the kernel image the better.
2) Memory sensitive, So smaller memory cost the better.
3) Time sensitive, Faster the better.
4) Power sensitive, because you can have heating and cooling limits and power supply limits making something power sensitive
Depending on what you are working on 1 or all 3 might apply.
Number of compression entries the kernel could need.
The best for all 4 being required. +1.
The best for 3 being +4
The best for when pairs of conditions are required. +6
The best for when only one of the conditions is required +4
That gives a a magic number of 15 plus not compressed. The Linux kernel does not have 15 compression option yet so there is still possibilities of needing to add compressions to cover all usages.
The thing about each of the sensitive every small gain adds up. Its like the camel and the straw that broke it back story. That 1 second limit in aircraft interfaces 1ns over is a failure. Its the same with storage and memory sensitive just over the limit is failed. Power sensitive is worse over limit can be hardware bricked for good.
Now if a compression cannot tick one of the 15 possible slots better than the other options then it should be removed. L4 does tick time sensitive very well and is practical in those markets. ZSTD I don't know what one of the 15 it possible usage cases it is.
LG white paper https://events.linuxfoundation.org/s...ojp13_klee.pdf they are clear they care about boot speed. This is in fact import in house hold appliances because people turn stuff on and expect almost instant on this gives you 5 second max boot the system and have interface up. More than aircraft but still your biggest battle is time. So when you have a true embedded maker doing presentations saying time is important it was foolish of sdack and starshipeleven to say it not something to consider.
Ok the LG white paper is bias to their market. Of course there are other markets where other mixes apply so there are users out there that are storage sensitive.
Comment
-
Originally posted by caligula View PostI'm not really sure any vendor wants to triple their boot time to save 20 cents using a cheaper flash chip.. It will also affect the overall development time quite a bit..
Please do remember a lot of SOC chips have hard limits on how much flash can be directed connected or embedded. This is what causes the insanely of lets compress everything to the max possible. Yes taking that too far ends up hurting long term. Linux kernel does not exist to judge its end users just service their needs.
- Likes 1
Comment
-
Originally posted by oiaohm View PostBoth are you are making the same mistake of attempt to think of embedded as one size that fits all its not..
Aircraft interfaces have to be fully back on line in 1 second and you may not be using the most high performing cpu either.
1) Storage sensitive. So smaller the kernel image the better.
2) Memory sensitive, So smaller memory cost the better.
3) Time sensitive, Faster the better.
4) Power sensitive, because you can have heating and cooling limits and power supply limits making something power sensitive
So when you have a true embedded maker doing presentations saying time is important it was foolish of sdack and starshipeleven to say it not something to consider.
The same device booting in 8-10 seconds boots in like 4 seconds if I replace its firmware with LEDE, and if I replace bootloader I can shave another second or two.
So it is not foolish for starshipeleven to say that decompression times are mostly irrelevant. With a better algorithm you can shave a second at most on a boot time of like 8 seconds.Last edited by starshipeleven; 12 October 2017, 04:20 AM.
Comment
-
Originally posted by caligula View PostExcept that this isn't true. The time it takes to decompress and boot kernel is like 3 seconds on RPi Zero. ...
The document you've linked is by the way only a sell piece, meant to list every imaginable positive use case, not listing any negatives, and to sway decision makers positively towards including it into the kernel. And yes, LZ4 does have many good uses, but you then need to stay critical and see the actual pros and cons of it. Things don't just automatically turn out positive just because you've read only positives about them. They need to apply, too.
LZ4 was written by the same author as ZSTD and ZSTD is meant to include LZ4. So it's logical to drop LZ4 from the kernel compression options and instead to provide a configurable compression setting with each algorithm. That way can users fine tune it to their needs.Last edited by sdack; 12 October 2017, 05:48 AM.
Comment
-
Originally posted by starshipeleven View PostYou are making the mistake of attempt of think Linux is run in all embedded when it does not.
Aircraft and in general critical systems don't use Linux as it isn't Real Time nor safety rated, so Linux boot times are irrelevant. They use VXWorks or any other decently well-known RTOS with proper safety certifications.
Aircraft use a mix of operating systems. Not all the systems need to be Real-time or safety rated. But they have the boot in 1 second requirement and be at normal operation. So you can kill the power restore the power and they be back. This could be a embedded Linux in the weather radar screen in the cockpit they do exist.
So you have just made a very incorrect presume of the operating systems you find in the cockpit.
Originally posted by starshipeleven View PostThis is admittedly a less important issue in modern embedded because what drives down the cost is mostly amount of mass production, so nowadays the cheaper chips aren't the 4MB ones anymore.
Originally posted by starshipeleven View PostSame as above. Also decompression takes memory at a time the OS isn't even runnign at all, there is just a single decompressor binary running on bare metal.
But I have seen custom decompressions for Linux time sensitive that start OS running while the decompression still unpacking blocks these can still have a block of memory allocated when the OS is fully up. This kind of does make sense in multi core cpu. Have like core 1 unpacking and have like core 2,3,4 starting to init hardware. The idea that decompression of kernel image only effects before OS is running is not always true.
Originally posted by starshipeleven View PostAs already stated in other places, bulk of the boot times in embedded are hardware initialisation times, even a shitty kirkwood CPU can decompress a kernel image in less than a second.
Originally posted by starshipeleven View PostHeh, this is going to a specific hardware issue, as even 1-2W systems (total power consumption) can decompress a kernel image in less than a second.
So yes you are right this is going to be caused by a specific hardware issue in the final product. That low thermal conducting Encapsulation Resins was that happen to be the best stuff to protect the board where it was being deployed. So then it comes embedded designers it work inside those limitations.
Originally posted by starshipeleven View PostStarshipeleven knows and has observed many embedded systems boot through their debug serial port and seen that of 8-10 seconds of boot time the bootloader itself is wasting 1-2 seconds (uboot for example can be slimmed down to boot more or less instantaneously and I know of some opensource projects that do that, did I ever see any OEM do that? nope, they all blindly ship the hardware manufacturer's bootloader with 0 modifications), decompressing is also a second tops if the kernel image is bare, or much more if the kernel image has also an initramfs embedded into it (dumb choice, but easier development), and then you have bulk of the time taken by the kernel/OS itself starting up (again because it's crap and uses scripts and hacks and is therefore 100% sequential even when it could just be parallelized like systemd does).
The same device booting in 8-10 seconds boots in like 4 seconds if I replace its firmware with LEDE, and if I replace bootloader I can shave another second or two.
So it is not foolish for starshipeleven to say that decompression times are mostly irrelevant. With a better algorithm you can shave a second at most on a boot time of like 8 seconds.
So there is a type 5
5) not sensitive to anything bar cost. But for deciding what features in kernel this group take what ever given. If what the Linux kernel does is slightly or slightly slow or consumes little more ram or storage they mostly will not care.
The hardware from these makers using a uncompressed kernel would perfectly fine. Worst set boot loaders ever including cases where boot loader locked memory from the OS using.
The hardware you are playing with is not classed as time sensitive. starshipeleven big difference here I am playing with industrial embedded some of the industrial embedded done very well. Include multi threaded init systems, decompression of OS image while OS is starting to init hardware. All these nice little features.
From your answer I would say you were not playing with any of the hardware that had any of the 4 sensitive pressure points. Why those devices are horrible from security and performance.... because when you sensitive is cost few cents in extra storage/memory/cpu in device you pass on to customer is no problem. 2000+ hours doing the device properly in man hours is completely skipped. Yes this is why boot loaders in those are like on generic defaults that are totally horrible because they are not going to waste the man hours optimising anything.
Comment
Comment