Announcement

Collapse
No announcement yet.

Arch's Switch To Zstd: ~0.8% Increase In Package Size For ~1300% Speedup In Decompression Time

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by hotaru View Post
    3. if I decompress from memory to memory, lbzip2 is close to memcpy speed (and sometimes even exceeds it), but that's not a realistic scenario for real world use.
    Sorry, I'm probably dumb, but how can decompression perform faster than memcpy? Isn't Memcpy supposed to be the fastest way to transfer data from address range X to Y?

    Comment


    • #22
      Originally posted by caligula View Post
      Sorry, I'm probably dumb, but how can decompression perform faster than memcpy? Isn't Memcpy supposed to be the fastest way to transfer data from address range X to Y?
      1. while decompression and copying have to write the same amount of data to memory, decompression doesn't have to read as much.
      2. this is comparing multithreaded decompression to single-threaded memcpy.

      Comment


      • #23
        Originally posted by AndyChow View Post

        They used a sample size of 545 packages. It's linked right in the article.
        I manually recompressed the entire repo to zstd for this, the results are not extrapolated
        The >=545 packages in the news post are just the packages currently available as .pkg.tar.zst

        Originally posted by -MacNuke- View Post


        COMPRESSZST=(zstd -c -T0 --ultra -20 -)
        This is correct.

        Extraction time (& I/O) is one of the most time-consuming parts of a system upgrade, especially when provisioning chroots, VMs, or other virtual and/or ephemeral systems, so personally i'm fairly happy with the results.

        As far as the size increase goes, we're talking about an increase of like, 300 MiB for our entire package repo - with 11265 packages this means that on average every package is ~29 KiB larger if i didn't miscalculate. That's not bad at all.

        What isn't mentioned in the news post is that *packaging* is also faster by a huge factor. Compressing big packages like cuda takes the better part of 20 minutes on our fastest build server with XZ, zstd cut that down to less than 2.

        (disclosure: i wrote the news post on archlinux.org and ran the numbers)

        Comment


        • #24
          Originally posted by hotaru View Post

          1. while decompression and copying have to write the same amount of data to memory, decompression doesn't have to read as much.
          2. this is comparing multithreaded decompression to single-threaded memcpy.
          Ok, makes sense now. I still think 1) applies more to RLE style algorithms when you can fetch the data straight from the registers. Dictionary based compression still requires reading from the memory, but it can utilize the caches better.

          Comment


          • #25
            But why is Zstd so good? I mean it has to use more RAM or something?

            Comment


            • #26
              Is XZ even used in parallellized fashion in such use case? Single threaded XZ is horrible. Parallelized XZ is OK. When updating Debian packages, I don't see CPU use go to 100%, so most likely package managers aren't using parallelized XZ.

              Comment


              • #27
                Originally posted by Misel View Post
                What I was getting at - or better trying to - was that 1300% speed up could mean 1ms instead of 13ms. As far as I can. tell they only mention the actual decompression - but not the download and disc access time. (Again, I may be wrong on that one).
                And the 0.8% size gain is negligible too. 1GB update adds what, 8MB more? Network takes a marginal amount of additional time from the total transfer time. If it took as long as 100 minutes, it's now adding 48 seconds, rather insignificant, especially since for most they're update won't take anywhere near that long to retrieve.

                Your other point was disk I/O. The data decompresses at a rate of 1300% faster... I'm not sure how you're trying to find that additional 8MB out of 1GB affecting I/O negatively when it can decompress data from disk at that rate.

                Comment


                • #28
                  I went back to CPU bottleneck the day I got NVMe.

                  This decomp bonus SEEMS like a bonus for HDD's, but the decomp being so vastly improved, would the decompression be so fast as to actually flog the HDD and the benefit be moot? Thats my hypothetical question for the day!
                  Hi

                  Comment


                  • #29
                    Originally posted by shmerl View Post
                    Is XZ even used in parallellized fashion in such use case? Single threaded XZ is horrible. Parallelized XZ is OK. When updating Debian packages, I don't see CPU use go to 100%, so most likely package managers aren't using parallelized XZ.
                    xz (the implementation, not the format) doesn't support parallel decompression. also, parallel xz compression causes problems for reproducible builds.

                    Comment


                    • #30
                      Originally posted by Misel View Post
                      Are there any absolute numbers or benchmarks available?

                      I didn't find any after a short search. 1300% sounds impressive but, personally, I'd rather save space (or rather disk and net I/O) than compute time. IMHO network speed and latencies add more time than decompression time.
                      You have to consider that it is a rolling release distro, packages get updated often, so they optimized for their use case.
                      0.8% is nothing, for 1300% decompression times

                      I would like to see some Brotli comparisons though, since it's optimized exactly for that use case: Slow Compression/Great compression, great size, very fast decompression.

                      Comment

                      Working...
                      X