Announcement

Collapse
No announcement yet.

Arch's Switch To Zstd: ~0.8% Increase In Package Size For ~1300% Speedup In Decompression Time

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Really don't care. Facebrick can keep their standard. xz is good enough for me, even gzip in most places.

    I fully recommend NOT using Zstd until its not controlled by facebrick. just fuck them.

    Comment


    • #32
      Originally posted by hotaru View Post

      xz (the implementation, not the format) doesn't support parallel decompression. also, parallel xz compression causes problems for reproducible builds.
      You can use the implementation that does support it: https://github.com/vasi/pixz

      Also, you can use single thread compression, if somehow parallelized compression is causing an issue with reproducible builds. For parallel decompression, you don't need to compress in parallel.
      Last edited by shmerl; 05 January 2020, 03:22 AM.

      Comment


      • #33
        Originally posted by shmerl View Post
        So use the implementation that does support it: https://github.com/vasi/pixz
        I just did a quick test with one of the files I use for compression benchmarks. I'm using 64 threads and the uncompressed file is 1.7GB.

        the results:
        decompressor decompression time
        lbzip2 7.76s
        zstd 11.6s
        pixz 50.3s​​​​​

        Comment


        • #34
          Originally posted by hotaru View Post

          I just did a quick test with one of the files I use for compression benchmarks. I'm using 64 threads and the uncompressed file is 1.7GB.

          the results:
          decompressor decompression time
          lbzip2 7.76s
          zstd 11.6s
          pixz 50.3s​​​​​
          So surely less than 1300% speedup listed above (around 434%). zstd is of course still faster, but I'd say parallel xz is already a huge benefit, when size is kept as is (low as current xz).
          Last edited by shmerl; 05 January 2020, 03:58 AM.

          Comment


          • #35
            I think the numbers are quite telling and hope other distributions will notice this and evaluate a move to zstd, too. And I cannot understand people who either question the benefits or don't want facebook to be in control (Why? As a matter of principle? As I see it they have an intrinsic motivation for further advancements and cannot spot a drawback as long as they keep pushing things forward).

            Comment


            • #36
              > Is XZ even used in parallellized fashion in such use case? Single threaded XZ is horrible. Parallelized XZ is OK.

              A little known fact about parallel implementations of lzma/xz is that they work by breaking the input into independent slices.
              This not only produces a different output from `xz`, hurting reproducibility,
              it also badly impacts compression ratio.

              As an example, a small test using the enwik8 file :
              size difference
              cp 100,000,000 n/a
              xz -6 26,375,764 -
              pxz -6 26,665,164 +1.09%
              pixz -6 26,859,732 +1.83%
              As one can see, using parallel xz implementations, one loses more than the ~0.8% mentioned by Arch for zstd, making them less attractive options.

              For reference, on this file, zstd using the Arch default settings gets 25,989,372 bytes.
              It's not even the best, as zstd could compress it down to 25,340,456 bytes by increasing its compression level (if it ever matters).

              _edit_ :
              pbzip2 -9 (max level) get this file down to 29,014,637 bytes.
              bzip2 is an old format, that used to be competitive 20 years ago, when facing only gzip (36,489,593 on the same file), but does no longer offer great compression ratios compared to more modern alternatives (zstd / xz).
              Last edited by evergreen; 05 January 2020, 05:11 AM.

              Comment


              • #37
                Do note that Zstandard is not a product of some opaque corporation or whatever. Its principal author is Yann Collet, the same guy who developed LZ4 several years earlier and has been active in the compression community for years; both his compressors are BSD-licensed. The second most active contributor (by number of commits) is Przemysław Skibiński, the creator of Silesia corpus amongst many other things.

                Comment


                • #38
                  I have always stated that the problem on Linux operating system is the efficiency. They tend to make complex what should be easy. The memory management is the worst by far. This progress proofs the aforementioned evidence. So linux developers are not competent.

                  Comment


                  • #39
                    For those complaining about dpkg and assuming changing compression will do anything: dpkg is slow as hell as it syncs the filesystem (I believe after each file).
                    Run apt through eatmydata to see the difference.

                    Speed never was a design criteria, if it was I would focus on decompressing/preprocessing multiple packages, possibly as they are downloaded. Network speed schould be the only limitinf factor then.

                    Comment


                    • #40
                      Originally posted by atomsymbol

                      Code:
                      $ man xz (5.2.4)
                      -T threads, --threads=threads
                      ... Threaded decompression hasn't been implemented yet. ...
                      Check pixz.

                      Comment

                      Working...
                      X