Announcement

Collapse
No announcement yet.

GCC Is Looking At Zstd For Compressing Its LTO Data

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC Is Looking At Zstd For Compressing Its LTO Data

    Phoronix: GCC Is Looking At Zstd For Compressing Its LTO Data

    The latest use-case for the increasingly popular Zstd compression algorithm could be employment by the GNU Compiler Collection (GCC) for compressing its link-time optimization (LTO) data...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    There are a lot of fuss around Zstd nowadays.
    I really would like to see a comparisons between Zstd and lz4.

    It seems for me that lz4 is a winner
    Even tough that Facebook claims that reusing the Dictionary as cache has powerful improvements,
    I don't find tests that claim that..

    Oracle by default goes with lz4 for its databases,
    And I have noticed it works very nice even in Databases > 16TB of data.

    Ubuntu is also choosing lz4,
    And for me, its a fast on-the-fly compressor/decompressor..maybe even for Swap partitions, were Zstd in today..

    Comment


    • #3
      Speed-wise (compression and decompresion) lz4 wins, but zstd has better compression-ratio, which can be fine-tuned from -1 to -19 (ultra up to -22)

      There is also new and not-well-known algorithm lizard (formerly lz5), which is even faster than lz4

      Comment


      • #4
        > difficulties with Zstd using CMake by default

        If it helps, there is a Meson patch: https://wrapdb.mesonbuild.com/zstd

        Meson should autogenerate a pkgconfig file, which should be usable by autotools and whatnot.
        Last edited by andreano; 21 June 2019, 02:51 PM.

        Comment


        • #5
          Originally posted by trubicoid2 View Post
          Speed-wise (compression and decompresion) lz4 wins, but zstd has better compression-ratio, which can be fine-tuned from -1 to -19 (ultra up to -22)

          There is also new and not-well-known algorithm lizard (formerly lz5), which is even faster than lz4
          Maybe the claim about fine tunning could be a difference..
          The claim about reusing the Dictionary, is a little vague, since lz4 can also use a 'external dictionary' to compress

          Comment


          • #6
            Originally posted by andreano View Post
            > difficulties with Zstd using CMake by default

            If it helps, there is a Meson patch: https://wrapdb.mesonbuild.com/zstd

            Meson should autogenerate a pkgconfig file, which should be usable by autotools and whatnot.
            The difficulty lies in building from source (potentially 2-3 times for canadian cross builds), Meson only makes that issue way worse and adds ninja as another dependency.

            Comment


            • #7
              Originally posted by tuxd3v View Post
              There are a lot of fuss around Zstd nowadays.
              I really would like to see a comparisons between Zstd and lz4.

              It seems for me that lz4 is a winner
              Even tough that Facebook claims that reusing the Dictionary as cache has powerful improvements,
              I don't find tests that claim that..
              zstd beats lz4 hands down as general usage compressor, as it has a wide range of compression ratio,
              and multithreaded compression has one notable feat I havent seen anywhere else: it creates the same file as a single threaded run.
              (xz and the pXY compressors just compress independent blocks, reducing compressionrate and affecting the output).

              zstd should be the default, unless you have very specific needs. lz4 would be faster, xz would compress better.

              Originally posted by tuxd3v View Post
              Oracle by default goes with lz4 for its databases,
              And I have noticed it works very nice even in Databases > 16TB of data.
              Special purpose

              Originally posted by tuxd3v View Post
              Ubuntu is also choosing lz4,
              And for me, its a fast on-the-fly compressor/decompressor..maybe even for Swap partitions, were Zstd in today..
              Ubuntu is chosing lz4 primary because zstd support for kernel and initramfs is not upstream.

              One downside of zstd would be, that its likely really slow on CPUs that cant do unaligned access. But that's a deficit shared with lz4,
              I would like to see some benchmarks on in-order or an ARMv5 cpu, would likely change the field dramatically.
              (my guess is that some of the old formats like lzo would be way faster there)

              Comment


              • #8
                Originally posted by discordian View Post
                zstd beats lz4 hands down as general usage compressor, as it has a wide range of compression ratio,
                and multithreaded compression has one notable feat I havent seen anywhere else: it creates the same file as a single threaded run.
                (xz and the pXY compressors just compress independent blocks, reducing compressionrate and affecting the output).
                Maybe zstd needs many threads, and still is slower than lz4..
                If you look into lz4 they are pushing 12-18+ % performance at each iteration..
                They don't feel the need to waist more CPU, ..they don't have too...

                On the fine tuning part,
                For having a bigger range in numerical values doesn't mean that its also a better option..the levels lz4 provides( 12 ) are sufficient, but here I think that lz4 on the limit will accuse the burden of compression, since its not heavily threaded..

                If you look into on-the-fly compression, you for sure don't want high compression levels, because it kills your performance, when system starts swapping for example..and for sure you want on-the-fly compression with less CPU resources, which is exactly what lz4 provides.

                You can take it has an example...its not something new, even aix that is a very lean OS, has compression around the default lz4 provides for RAM..
                That is not because you can't configure a higher value( when you compress RAM.. ) but because higher values means a system slowing badly, so usually we don't take the risk, above around what lz4 does, if needed we advise the Client to Buy more RAM..

                On arm32, or aarch32, I don't know, on aarch64, lz4 is well ahead of Zstd..

                But I would love to see a comparison that makes sense, if possible, between both,
                On all levels, cpu, ram, compression/decompression times/IO, with 2 sets of difference files one Binary, and one Text-File, with a Dictionary and Without..

                In that way we would have a better picture about them..

                Comment


                • #9
                  Originally posted by tuxd3v View Post
                  ... and still is slower than lz4..
                  Of course it is slower. ZSTD is LZ4 + entropy coding. Entropy coding does not come for free, but it allows higher compression ratios than LZ4 alone.

                  Comment


                  • #10
                    Originally posted by tuxd3v View Post
                    Maybe zstd needs many threads, and still is slower than lz4..
                    If you look into lz4 they are pushing 12-18+ % performance at each iteration..
                    They don't feel the need to waist more CPU, ..they don't have too...
                    memcpy is still faster than lz4. speed is not the only thing you want, compression rate matters aswell.

                    Originally posted by tuxd3v View Post
                    On the fine tuning part,
                    For having a bigger range in numerical values doesn't mean that its also a better option..the levels lz4 provides( 12 ) are sufficient, but here I think that lz4 on the limit will accuse the burden of compression, since its not heavily threaded..
                    I am not talking about numerical values to he compressor, but the resulting range they represent. zstd covers almost the complete range of the remaining compressors,
                    mostly compressing better and faster. Whit the outliers at best compression (alone) and speed (alone) still going to xz/lzma and lz4.
                    And its not "heavily threaded" as most measurements you find around the net are not using threads, the nice thing is you can use threads and the compression speed will scale up nicely while not negatively affecting rate.

                    Originally posted by tuxd3v View Post
                    If you look into on-the-fly compression, you for sure don't want high compression levels, because it kills your performance, when system starts swapping for example..and for sure you want on-the-fly compression with less CPU resources, which is exactly what lz4 provides.
                    The topic is compressing LTO information. if you keep the static libraries around then you would want the option of a good compression rate, which lz4 lacks. Ideally you have either a fast mode (single compile) or a good mode (static libs you consume multiple times) - zstd covers alot more range with a single compressor/decompressor

                    Originally posted by tuxd3v View Post
                    You can take it has an example...its not something new, even aix that is a very lean OS, has compression around the default lz4 provides for RAM..
                    That is not because you can't configure a higher value( when you compress RAM.. ) but because higher values means a system slowing badly, so usually we don't take the risk, above around what lz4 does, if needed we advise the Client to Buy more RAM..
                    lz4 makes sense if you cant compromise on speed, I am not saying otherwise.

                    Originally posted by tuxd3v View Post
                    ​​​​​​​
                    On arm32, or aarch32, I don't know, on aarch64, lz4 is well ahead of Zstd..
                    If you introduce it in an pretty ubiquitous project, then maybe every arch is relevant. What I do know is that zstd and lz4 both are designed for "fat" out-of-order CPUs executing multiple operation in parallel, on small CPUs their performance might not look as favourable.

                    Originally posted by tuxd3v View Post
                    ​​​​​​​
                    But I would love to see a comparison that makes sense, if possible, between both,
                    On all levels, cpu, ram, compression/decompression times/IO, with 2 sets of difference files one Binary, and one Text-File, with a Dictionary and Without..

                    In that way we would have a better picture about them..
                    Its not like those don't exist, people are switching to zstd because of them, not because its "the new thing".
                    https://gregoryszorc.com/blog/2017/0...ith-zstandard/
                    Last edited by discordian; 22 June 2019, 08:30 AM.

                    Comment

                    Working...
                    X