Announcement

Collapse
No announcement yet.

Arch's Switch To Zstd: ~0.8% Increase In Package Size For ~1300% Speedup In Decompression Time

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by Misel View Post
    I guess, I should have put more emphasis on disk I/O and network speed rather than the actual required space. The actual space these days is not an issue. Most storage devices are more than large enough for the extra .8%.

    What I was getting at - or better trying to - was that 1300% speed up could mean 1ms instead of 13ms. As far as I can. tell they only mention the actual decompression - but not the download and disc access time. (Again, I may be wrong on that one).

    So are there any benchmarks against other algorithms. E.g. bz2 which is much slower but has a much better compression in comparison. So download time and disc reads should be much faster. The question is, though, how do they factor in?

    So again, are there any benchmarks with absolute numbers?
    All your words give me a feeling you have not updated your system for at least 5 years (or maybe 10 for areas with fast Internet like Japan/South Korea)

    Network has not been the bottleneck for a long time. Currently drpm rebuild takes the majority of my time during the update.
    Asking absolution numbers makes zero sense when every user has a clearly understanding these numbers are NOT on the critical path.

    Comment


    • #22
      Originally posted by hotaru View Post
      3. if I decompress from memory to memory, lbzip2 is close to memcpy speed (and sometimes even exceeds it), but that's not a realistic scenario for real world use.
      Sorry, I'm probably dumb, but how can decompression perform faster than memcpy? Isn't Memcpy supposed to be the fastest way to transfer data from address range X to Y?

      Comment


      • #23
        Originally posted by caligula View Post
        Sorry, I'm probably dumb, but how can decompression perform faster than memcpy? Isn't Memcpy supposed to be the fastest way to transfer data from address range X to Y?
        1. while decompression and copying have to write the same amount of data to memory, decompression doesn't have to read as much.
        2. this is comparing multithreaded decompression to single-threaded memcpy.

        Comment


        • #24
          Originally posted by AndyChow View Post

          They used a sample size of 545 packages. It's linked right in the article.
          I manually recompressed the entire repo to zstd for this, the results are not extrapolated
          The >=545 packages in the news post are just the packages currently available as .pkg.tar.zst

          Originally posted by -MacNuke- View Post

          https://git.archlinux.org/devtools.g...kg-x86_64.conf
          COMPRESSZST=(zstd -c -T0 --ultra -20 -)
          This is correct.

          Extraction time (& I/O) is one of the most time-consuming parts of a system upgrade, especially when provisioning chroots, VMs, or other virtual and/or ephemeral systems, so personally i'm fairly happy with the results.

          As far as the size increase goes, we're talking about an increase of like, 300 MiB for our entire package repo - with 11265 packages this means that on average every package is ~29 KiB larger if i didn't miscalculate. That's not bad at all.

          What isn't mentioned in the news post is that *packaging* is also faster by a huge factor. Compressing big packages like cuda takes the better part of 20 minutes on our fastest build server with XZ, zstd cut that down to less than 2.

          (disclosure: i wrote the news post on archlinux.org and ran the numbers)

          Comment


          • #25
            Originally posted by hotaru View Post

            1. while decompression and copying have to write the same amount of data to memory, decompression doesn't have to read as much.
            2. this is comparing multithreaded decompression to single-threaded memcpy.
            Ok, makes sense now. I still think 1) applies more to RLE style algorithms when you can fetch the data straight from the registers. Dictionary based compression still requires reading from the memory, but it can utilize the caches better.

            Comment


            • #26
              But why is Zstd so good? I mean it has to use more RAM or something?

              Comment


              • #27
                Is XZ even used in parallellized fashion in such use case? Single threaded XZ is horrible. Parallelized XZ is OK. When updating Debian packages, I don't see CPU use go to 100%, so most likely package managers aren't using parallelized XZ.

                Comment


                • #28
                  Originally posted by Misel View Post
                  What I was getting at - or better trying to - was that 1300% speed up could mean 1ms instead of 13ms. As far as I can. tell they only mention the actual decompression - but not the download and disc access time. (Again, I may be wrong on that one).
                  And the 0.8% size gain is negligible too. 1GB update adds what, 8MB more? Network takes a marginal amount of additional time from the total transfer time. If it took as long as 100 minutes, it's now adding 48 seconds, rather insignificant, especially since for most they're update won't take anywhere near that long to retrieve.

                  Your other point was disk I/O. The data decompresses at a rate of 1300% faster... I'm not sure how you're trying to find that additional 8MB out of 1GB affecting I/O negatively when it can decompress data from disk at that rate.

                  Comment


                  • #29
                    I went back to CPU bottleneck the day I got NVMe.

                    This decomp bonus SEEMS like a bonus for HDD's, but the decomp being so vastly improved, would the decompression be so fast as to actually flog the HDD and the benefit be moot? Thats my hypothetical question for the day!

                    Comment


                    • #30
                      Originally posted by shmerl View Post
                      Is XZ even used in parallellized fashion in such use case? Single threaded XZ is horrible. Parallelized XZ is OK. When updating Debian packages, I don't see CPU use go to 100%, so most likely package managers aren't using parallelized XZ.
                      xz (the implementation, not the format) doesn't support parallel decompression. also, parallel xz compression causes problems for reproducible builds.

                      Comment

                      Working...
                      X