Announcement

Collapse
No announcement yet.

Fedora 31 Considers Compressing Their RPM Packages With Zstd Rather Than XZ

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fedora 31 Considers Compressing Their RPM Packages With Zstd Rather Than XZ

    Phoronix: Fedora 31 Considers Compressing Their RPM Packages With Zstd Rather Than XZ

    Fedora has been using XZ-compressed RPMs for the past decade but with the Fedora 31 release due out later this year they are currently evaluating a switch over to Zstd compression...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Give me xz -9 compressed rpms, and parallel decompression. I've got 16 threads, bring it on. Sounds like to me that the problem is that the rpm process is a serializing bottleneck. It could be decompressing rpms in parallel ahead of time and using available memory.
    Last edited by xorbe; 30 May 2019, 01:28 PM.

    Comment


    • #3
      xz is fine, if used with indexed chunked compression like pixz. It allows parallelized decompression using all cores. Regular unindexed xz is bad, since it's not parallelizable.

      Comment


      • #4
        After looking at the chart...why aren't they running it with "-T0"?

        I might not be smart enough to be a Fedora Engineer, but I'm smart enough to use the damn multithreading with autodetect flag.

        From "xz --help"
        -T, --threads=NUM use at most NUM threads; the default is 1; set to 0
        to use as many threads as there are processor cores
        Did a quick experiment with the LInux 5.1 sources:

        Code:
        tar cvf linux-5.1.tar linux-5.1
        
        ls -lh linux-5.1.tar
            832M May 30 12:24 linux-5.1.tar
        
        time xz -c -z -2 -T0 linux-5.1.tar > linux-5.1.T0.xz
            real    0m9.998s
            user    1m45.338s
            sys     0m0.870s
        
        ls -lh linux-5.1.T0.xz
            127M May 30 12:27 linux-5.1.T0.xz
        
        time xz -c -z -2 -T1 linux-5.1.tar > linux-5.1.T1.xz
            real    1m1.694s
            user    1m1.258s
            sys     0m0.295s
        
        ls -lh linux-5.1.T1.xz
            126M May 30 12:32 linux-5.1.T1.xz
        
        #for shits and giggles
        time xz -c -z -9 -e -T0 linux-5.1.tar > linux-5.1.T0e.xz
            real    2m31.824s
            user    10m25.718s
            sys     0m3.111s
        
        ls -lh linux-5.1.T0e.xz
            100M May 30 12:36 linux-5.1.T0e.xz
        Edit: Should add that -T0 is 16 threads on my system.

        Edit 2: Had to go to the bathroom, finished testing:

        Code:
        time xz -c -z -9 -e -T1 linux-5.1.tar > linux-5.1.T1e.xz
            real    10m22.610s
            user    10m16.728s
            sys     0m1.776s
        
        ls -lh linux-5.1.T1e.xz
            100M May 30 12:53 linux-5.1.T1e.xz
        
        time xz -c -d -T0 linux-5.1.T1e.xz > linux-5.1-T1e
            real    0m7.588s
            user    0m7.052s
            sys     0m0.519s
        
        time xz -c -d -T1 linux-5.1.T1e.xz > linux-5.1-T1e
            real    0m7.774s
            user    0m7.210s
            sys     0m0.537s
        
        time xz -c -d -T0 linux-5.1.T0.xz > linux-5.1-T0
            real    0m8.596s
            user    0m8.116s
            sys     0m0.464s
        
        time xz -c -d -T1 linux-5.1.T1.xz > linux-5.1-T1
            real    0m8.595s
            user    0m8.071s
            sys     0m0.505s
        
        # Forgot to add the size of linux 5.1 tar'd
        ls -lh linux-5.1.tar
            832M May 30 12:24 linux-5.1.tar
        Anecdotal, but it shows that multithreaded xz decompression isn't very useful while multithreaded compression is more useful overall, ~6x faster when compressing the linux 5.1 sources.
        Last edited by skeevy420; 30 May 2019, 02:09 PM.

        Comment


        • #5
          In my experience, parallel xz only really kicks into high gear with larger data sets. That's why I was suggesting decompressing future rpms ahead of time during installation.

          Comment


          • #6
            Originally posted by skeevy420 View Post
            After looking at the chart...why aren't they running it with "-T0"?

            I might not be smart enough to be a Fedora Engineer, but I'm smart enough to use the damn multithreading with autodetect flag.

            From "xz --help"


            Did a quick experiment with the LInux 5.1 sources:

            Code:
            tar cvf linux-5.1.tar linux-5.1
            
            ls -lh linux-5.1.tar
            832M May 30 12:24 linux-5.1.tar
            
            time xz -c -z -2 -T0 linux-5.1.tar > linux-5.1.T0.xz
            real 0m9.998s
            user 1m45.338s
            sys 0m0.870s
            
            ls -lh linux-5.1.T0.xz
            127M May 30 12:27 linux-5.1.T0.xz
            
            time xz -c -z -2 -T1 linux-5.1.tar > linux-5.1.T1.xz
            real 1m1.694s
            user 1m1.258s
            sys 0m0.295s
            
            ls -lh linux-5.1.T1.xz
            126M May 30 12:32 linux-5.1.T1.xz
            
            #for shits and giggles
            time xz -c -z -9 -e -T0 linux-5.1.tar > linux-5.1.T0e.xz
            real 2m31.824s
            user 10m25.718s
            sys 0m3.111s
            
            ls -lh linux-5.1.T0e.xz
            100M May 30 12:36 linux-5.1.T0e.xz
            Edit: Should add that -T0 is 16 threads on my system.
            Just making a quick guess here but I think that they don't do this in order to not have any update drain resources from the normal day to day operation of a server or workstation. And even if you lower the priority of xz the threaded version takes quite a lot of RAM.

            Comment


            • #7
              Originally posted by xorbe View Post
              In my experience, parallel xz only really kicks into high gear with larger data sets. That's why I was suggesting decompressing future rpms ahead of time during installation.
              I suppose if a person had the ram for it or was using a SSD or better that would work fine. It would lead to IO issues if they couldn't extract to a temporary ramdisk due to all of the extract, move, delete, etc commands that would all be going on damn-near all at once on slower storage media.

              Comment


              • #8
                Originally posted by F.Ultra View Post

                Just making a quick guess here but I think that they don't do this in order to not have any update drain resources from the normal day to day operation of a server or workstation. And even if you lower the priority of xz the threaded version takes quite a lot of RAM.
                Fedora isn't Arch in that you update 45 times in a day...and even if it was it would only effect stuff made locally with makepkg

                From my makepkg.conf
                Code:
                COMPRESSXZ=(xz -c -z -9 -e -T0 -)
                All of that is on Fedora's end, where the compression will actually occur, and not on the end-user's machine (unless they're making their own packages). Like my values with the kernel sources above show, it all decompresses at around the same speed regardless of what compression level Fedora would set or if they have the end-user's preconfigured to use multithreaded xz.

                Your issue really isn't an issue is what I'm saying...especially when considering their conservative value of -2 when the xz default is -6...

                Comment


                • #9
                  Originally posted by xorbe View Post
                  Give me xz -9 compressed rpms, and parallel decompression. I've got 16 threads, bring it on. Sounds like to me that the problem is that the rpm process is a serializing bottleneck. It could be decompressing rpms in parallel ahead of time and using available memory.
                  If it's supposed to decompress into RAM, it would bring memory usage up significantly, so it would have to be optional, since people might make assumptions about how many resources upgrading will take. Trying to install multiple packages at once would be I/O bound, and could result in nasty race conditions. In general, I support using state of art compression whenever possible, so I think they should go ahead and break compatibility with ancient releases.

                  Comment


                  • #10
                    Originally posted by skeevy420 View Post

                    Fedora isn't Arch in that you update 45 times in a day...and even if it was it would only effect stuff made locally with makepkg

                    From my makepkg.conf
                    Code:
                    COMPRESSXZ=(xz -c -z -9 -e -T0 -)
                    All of that is on Fedora's end, where the compression will actually occur, and not on the end-user's machine (unless they're making their own packages). Like my values with the kernel sources above show, it all decompresses at around the same speed regardless of what compression level Fedora would set or if they have the end-user's preconfigured to use multithreaded xz.

                    Your issue really isn't an issue is what I'm saying...especially when considering their conservative value of -2 when the xz default is -6...
                    My bad, I thought that you where talking about parallel deflating and not compression. Well perhaps they are using a shared build server and don't want the system all bobbed down by one single maintainer?

                    Comment

                    Working...
                    X