Announcement

**caligula** · 04 January 2020, 08:45 PM

Originally posted by hotaru View Post

3. if I decompress from memory to memory, lbzip2 is close to memcpy speed (and sometimes even exceeds it), but that's not a realistic scenario for real world use.

Sorry, I'm probably dumb, but how can decompression perform faster than memcpy? Isn't Memcpy supposed to be the fastest way to transfer data from address range X to Y?

**hotaru** · 04 January 2020, 08:47 PM

Originally posted by caligula View Post

Sorry, I'm probably dumb, but how can decompression perform faster than memcpy? Isn't Memcpy supposed to be the fastest way to transfer data from address range X to Y?

1. while decompression and copying have to write the same amount of data to memory, decompression doesn't have to read as much.
2. this is comparing multithreaded decompression to single-threaded memcpy.

**coderobe** · 04 January 2020, 08:51 PM

Originally posted by AndyChow View Post

They used a sample size of 545 packages. It's linked right in the article.

I manually recompressed the entire repo to zstd for this, the results are not extrapolated

The >=545 packages in the news post are just the packages currently available as .pkg.tar.zst

Originally posted by -MacNuke- View Post

Explore groups · GitLab

https://git.archlinux.org/devtools.git/tree/makepkg-x86_64.conf

GitLab Enterprise Edition

COMPRESSZST=(zstd -c -T0 --ultra -20 -)

This is correct.

Extraction time (& I/O) is one of the most time-consuming parts of a system upgrade, especially when provisioning chroots, VMs, or other virtual and/or ephemeral systems, so personally i'm fairly happy with the results.

As far as the size increase goes, we're talking about an increase of like, 300 MiB for our entire package repo - with 11265 packages this means that on average every package is ~29 KiB larger if i didn't miscalculate. That's not bad at all.

What isn't mentioned in the news post is that *packaging* is also faster by a huge factor. Compressing big packages like cuda takes the better part of 20 minutes on our fastest build server with XZ, zstd cut that down to less than 2.

(disclosure: i wrote the news post on archlinux.org and ran the numbers)

**caligula** · 04 January 2020, 09:23 PM

Originally posted by hotaru View Post

1. while decompression and copying have to write the same amount of data to memory, decompression doesn't have to read as much.
2. this is comparing multithreaded decompression to single-threaded memcpy.

Ok, makes sense now. I still think 1) applies more to RLE style algorithms when you can fetch the data straight from the registers. Dictionary based compression still requires reading from the memory, but it can utilize the caches better.

**cl333r** · 04 January 2020, 11:09 PM

But why is Zstd so good? I mean it has to use more RAM or something?

**shmerl** · 04 January 2020, 11:14 PM

Is XZ even used in parallellized fashion in such use case? Single threaded XZ is horrible. Parallelized XZ is OK. When updating Debian packages, I don't see CPU use go to 100%, so most likely package managers aren't using parallelized XZ.

**polarathene** · 05 January 2020, 12:01 AM

Originally posted by Misel View Post

What I was getting at - or better trying to - was that 1300% speed up could mean 1ms instead of 13ms. As far as I can. tell they only mention the actual decompression - but not the download and disc access time. (Again, I may be wrong on that one).

And the 0.8% size gain is negligible too. 1GB update adds what, 8MB more? Network takes a marginal amount of additional time from the total transfer time. If it took as long as 100 minutes, it's now adding 48 seconds, rather insignificant, especially since for most they're update won't take anywhere near that long to retrieve.

Your other point was disk I/O. The data decompresses at a rate of 1300% faster... I'm not sure how you're trying to find that additional 8MB out of 1GB affecting I/O negatively when it can decompress data from disk at that rate.

**stiiixy** · 05 January 2020, 02:50 AM

I went back to CPU bottleneck the day I got NVMe.

This decomp bonus SEEMS like a bonus for HDD's, but the decomp being so vastly improved, would the decompression be so fast as to actually flog the HDD and the benefit be moot? Thats my hypothetical question for the day!

**hotaru** · 05 January 2020, 02:54 AM

Originally posted by shmerl View Post

Is XZ even used in parallellized fashion in such use case? Single threaded XZ is horrible. Parallelized XZ is OK. When updating Debian packages, I don't see CPU use go to 100%, so most likely package managers aren't using parallelized XZ.

xz (the implementation, not the format) doesn't support parallel decompression. also, parallel xz compression causes problems for reproducible builds.

**Alliancemd** · 05 January 2020, 02:59 AM

Originally posted by Misel View Post

Are there any absolute numbers or benchmarks available?

I didn't find any after a short search. 1300% sounds impressive but, personally, I'd rather save space (or rather disk and net I/O) than compute time. IMHO network speed and latencies add more time than decompression time.

You have to consider that it is a rolling release distro, packages get updated often, so they optimized for their use case.
0.8% is nothing, for 1300% decompression times

I would like to see some Brotli comparisons though, since it's optimized exactly for that use case: Slow Compression/Great compression, great size, very fast decompression.

Announcement

Arch's Switch To Zstd: ~0.8% Increase In Package Size For ~1300% Speedup In Decompression Time

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment