Announcement

Collapse
No announcement yet.

Bzip2 To See Revival Under New Maintainership, Experimental Porting To Rust

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by hotaru View Post
    in my experience, anything without multi threaded decompression is obsolete outside of uses where I'm stuck with single threaded decompressors. most of the compressed data I have is xz or lz4,
    Depends on the compressor. In the specific case of LZ4, it decompresses pretty close to RAM bandwidth. Throwing lots of core would simply congest the bandwidth between your CPU and your RAM sticks.

    Also depends the framework in which you're compressing.
    - Compressing a long linear stream of data? Yes indeed, very poor parallelizing opportunities, unless using a specific block-based algo (the B from Bzip2), or a special block-modified variant of the others (Resyncable gzip, etc.) but usually you're trading that specifically for better compression (due to larger dictionnary).
    - Compressing a file system ? Most of the implementations compress data in chunks (e.g.: BTRFS compresses independent chunks of 128k) no matter what the algo, it is inherently parallelizable as each chunk is independent.

    (And also Zstd has a separate different table of settings when in chunk mode. There's a special different setting when asking for level -19 on 128kb chunks - with the dictionnary size still limited to 128kb, obviously, but the other parameters (entropy, search exhaustiveness) tuned up).

    Originally posted by atomsymbol View Post
    In other words, the number of cases where any of bzip2 / lbzip2 / pbzip2 is able to surpass all three of xz / zstd / brotli is extremely small.
    Extremely small:
    Basically, the "long string of almost nearly the same log entries" corner case that oleid is hitting with his logs.
    And genome read sequences (obviously).

    Originally posted by birdie View Post
    bzip2 decompression speed is simply atrocious
    bzip2 relies on a BWT - a reversible sort. It means that you must run the same core operation (the BWT) in both the compressor and the decompressor, only in reverse direction. each side runs a sort.

    As opposed as most LZ-based dictionnary approach, which must manage a whole dictionary and associate look-up system at compression time, but only look-back stuff from the already decompressed stream at decompression time.
    They are highly assymetric compression schemes.

    Originally posted by birdie View Post
    vs. LZMA2 based compressors.
    Careful what you're comparing with. the LZ-part (dictionnary) is the *only* assymetric part in modern compressors. The entropy part in modern compressor IS USUALLY NOT.
    (usually you have the compressor and decompressor side running their entropy engines in sync)
    LZMA is about the limit where you get faster decompression.
    Once you move into deeper into machine learning territory (paq compressors etc.) running the ml-entropy engine at decompression is nearly as expensive than at compression and it's the part that dominate the speed.

    Originally posted by birdie View Post
    And it's single-threaded.
    the classic bzip2 implementation happens to be single-threaded. As bzip2 is block-based, there are no technical constrains forcing you to do so (except for memory constrains - you need to process the blocks somewhere). There are parallel implementation en vogue in specific workloads that bzip2 algo catters to (e.g.: very popular in bioinformatics).

    Originally posted by oleid View Post
    You nailed it. This logfile format with CAN data has similar patterns all over the place.
    It might be, for fun, interesting to see how Zstd would fare when you have pre-trained a dictionnary over a large corpus of log files.
    Might not be useful in your specific use-case (the embed device might be too memory-constrained), but it would be interesting to see how it reacts

    Originally posted by oleid View Post
    Once took the original bwt code and did a transformation of a whole file. Basically used it as a preprocessor for other compression algos. The resulting compressed files were some 10% smaller than the un-preprocessed files for XZ.
    The thing is that the output of BWT is very peculiar (long swaths of almost sorted strings of symbols - the "almost" part being enough left over information to revert the transform... it's a *reversible* sort, after all). XZ isn't necessarily the best geared toward it. (Maybe if you leave also the "move-to-front" part of it? Then XZ basically works as a glorified RLE which might be better than the old "modified RLE" used in Bzip2)

    Originally posted by oleid View Post
    But the whole file bwt was unbearable slow xD
    Because you need to manipulate the whole block at the same time (as opposed to LZ where you can accelerate stuff by using some hash look-up table that fits as much as possible in cache and then only look-up the relevant part (or best candidate) of your dictionnary).

    You need a special kind of architecture to handle this kind of workload (some bioinformatics cluster ? :-P )

    Originally posted by hotaru View Post
    all the tests I've done have been with the highest level for Zstandard.
    That is weird. Normally that's restricted to the above mentionned cases (log files, genomic sequencing data, etc.)

    Originally posted by hotaru View Post
    a compiled Linux kernel, an initrd image, a few dozen compiled Arch packages,
    okay, these are already compressed stuff and can't change a thing (except maybe very aggressive compressors such as ZX and even moreso PAQ can squeeze a tiny bit of entropy bits out, specially if you used GZ compression instead in these binaries of the standard XZ or the fast ZStd.

    Originally posted by hotaru View Post
    while it is possible that the Linux kernel source code,
    kernel.org switched from bz2 to xz for their "highly compressed" alternative because xz both compresses better and decompresses faster.
    That's litterally one of the example use case of xz over bzip2.
    There's not enough of the "almost but not quite" similarity where bzip2 excel (unless Linus got kidnapped by a bunch of SJW and some idiot took over in the meantime).


    Originally posted by hotaru View Post
    and a bunch of large text files in various languages all happen to be specific cases where BW excels or where Zstandard performs poorly,
    So, basically it all boils down to your large collection of text files and where you got them from.

    If they are mostly varied text (like the wikipedia corpus used in compression competitions), the no way, these are already proven to compress better with LZMA-based compressors (or even PAQ if you're open to leave normal everyday compressor and go for more exotic ones).
    Something is wrong with your test if you see better performance from bzip2.

    If they are more or less alike but not quite (logs, long list of translations, etc.) they might be the corner cases that compresses better than XZ, and if that part of your collection is large enough, it could compensate the dozens of MB lost in the kernel source part and dominate the final result of compression of your collection.
    Then Bzip2 performs as expected. Which is an argument to always test compression on what you're most likely to compress, and don't forget to test solution with known corner cases - see oleid's log examples.

    Originally posted by hotaru View Post
    AFAIK, neither of those do multi threaded decompression, and decompression speed is a lot more important than compression speed for most use cases.
    LZ4 beast the crap out of everything in terms of decompression speeds as it works close to memory bandwidth limit.

    Zstd is the next best thing, taking lot of the optimisations that make LZ4 fast, and throwing a special type of entropy (tANS - table based assymetric numeric system. A weird type of range-encoding that has been turned on its head around, and can work as a special kind of "entropy table of huffman+remainder" that can reach Shannon limit while retaining huffman speed).

    It completely avoids the "slow entropy must be run on both ends".

    Zstd is both the fastest, and also the best you can reach before entering machine-learning territory (such as LZMA's (Relatively) fast Markov Model - that's where the MA comes from - or PAQ's horrendously slow neural nets).

    Zstd just beats everything else, unless you need better (and slower) compression from XZ's LZMA or PAQ and co.

    Yann Collet has single handedly revolutionnized the compression landscape (well him and Jarek Duda, the insane guy who wrote the ANS papers - plural, because it's such a revolutionnary stuff that it needed two papers to explain).

    Originally posted by birdie View Post
    but given bzip2 pathetic dictionary (900KB) it only applies to small documents (less than ~20MB).
    There's no such thing as a "dictionnary" in Bzip2.
    It's not LZ based. it's BWT based.
    900KB refers to the size of block upon which BWT is performed. The result is very efficiently "sort-of-RLE" encoded, no dictionary involved at all, and instead of entropy it uses a "move to front scheme" to improve the RLE.

    Comment


    • #52
      Originally posted by DrYak View Post
      Still, in my book, there's a difference in software that requires maybe 6-7 other "libBLAH-devel" package, and software that needs to download half of github (no matter if from microsoft azure or from local cache) just to barely function.
      The real fun begins, when those libs have no proper symbol versioning and thus cannot be installed alongside each other. And software a needed one version while software b the other. At least in rust, this is no problem, as the required parts are statically linked AND this even works, when indirect dependencies have different version requirements of the same lib. You can even generate distribution binary packages using cargo (the build system). Rustc and cargo can easily be packaged, so packaging rust stuff should be easy.


      Originally posted by DrYak View Post
      It's not necessarily the *compiler's* job to handle dependencies. It might be the *packaging system's* job. both RPM and DEB do a sufficiently decent job at it given the number of distro that have been successful.
      They do and they continue to do. But then you don't necessarily target only Linux distributions. And even if you do, it's good to be able to specify upstream which library version to use instead of downstream, as you tested these versions.


      Comment


      • #53
        Originally posted by DrYak View Post
        But then you can't just build the thing from scratch because you need to have support for that language.
        Relying on stuff that is commonly distributed in binary form does not make a system able to build things "from scratch".

        It's still better to rely on a binary toolchain to jumpstart instead than having to deal with stupid build systems.

        Comment


        • #54
          Originally posted by DrYak View Post
          In the specific case of LZ4, it decompresses pretty close to RAM bandwidth.
          muntithreaded bzip2 does too, and achieves much better compression ratios. on a NUMA machine with 4 nodes and quad channel DDR3 on each node, multithreaded bzip2 is even faster at in-memory compression and decompression than LZ4, because it can use more of the available memory bandwidth.

          Originally posted by DrYak View Post
          okay, these are already compressed stuff
          no, they're not.

          Originally posted by DrYak View Post
          xz both compresses better and decompresses faster.
          multithreaded bzip2 decompresses a lot faster.

          Originally posted by DrYak View Post
          LZ4 beast the crap out of everything in terms of decompression speeds as it works close to memory bandwidth limit.
          see above.

          Comment


          • #55
            Originally posted by starshipeleven View Post
            Relying on stuff that is commonly distributed in binary form does not make a system able to build things "from scratch".

            It's still better to rely on a binary toolchain to jumpstart instead than having to deal with stupid build systems.
            In all your trolling efforts you are speaking from the point of view of an end-user. And people arguing here are trying to say that the packages are not appearing in your favorite distro magically. Someone actually goes through varying grades of pain to prepare them.

            Rust people sometimes seem to me just like plain noobs who know only their beloved toy and believe that the world ends there. All they do is repeat their mantras "reliable", "secure", "productive" and so on without displaying tiny bit of understanding of the terms or giving any meaningful example.

            Rust is just a tool. It does not give any magical advantages against C or C++ to a person who does not understand what is he doing and trying to achieve. Just why not play nice and understand that someone has to deal with the unnecessary hipster toy. That may actually draw more attention of sane people. Right now it seems they are just forcing it furiously.

            Comment


            • #56
              Originally posted by jjkk View Post

              In all your trolling efforts you are speaking from the point of view of an end-user. And people arguing here are trying to say that the packages are not appearing in your favorite distro magically. Someone actually goes through varying grades of pain to prepare them.
              That's the same for autotools direct or indirect dependencies, namely bash.

              Even lfs contains python, so e.g. Meson would be no problem. http://www.linuxfromscratch.org/lfs/...chapter05.html


              Originally posted by jjkk View Post
              Rust is just a tool. It does not give any magical advantages against C or C++ to a person who does not understand what is he doing and trying to achieve.
              Such a person shouldn't be programming anyway. But if they do, they'd come tothe conclusion, that rustc is a good tutor. I learnt, that if my Ansatz of realizing something in rust results in overly complex code, there is probably something wrong with it. The same Ansatz would have been easier in c++, howeve, would still have been bad design.

              Comment


              • #57
                Originally posted by jjkk View Post
                And no, you can not reproduce Rust without extending this bare minimum by a binary build of itself.
                You cannot build a modern gcc8 with an ancient gcc. I've tried. And failed.

                Considering rust, a gcc backend is planned. So maybe a future bootstrap environment of yours will contain 'grustc', or however it will be called.

                By the way, nobody is stopping you from using rustc from a plain makefile. No need for cargo.


                Comment


                • #58
                  Originally posted by oleid View Post

                  You cannot build a modern gcc8 with an ancient gcc. I've tried. And failed.
                  Oh, well. I can. I have tried. And succeed. Many times. And since you did not specify "ancient" it is just your word against mine.

                  If that is really a case, you still have a possibility to build any version between ancient and target and use it as a bootstrap. It is not hard to automate but the outcome is reproducible, deterministic build. You will get same binary for same source code every time and a manageable means to track down problematic changes. To do the same with Rust you have to make extra moves and I am still not convinced they are not avoidable. Especially caching locally some specific branches of git repos.

                  Originally posted by oleid View Post

                  Considering rust, a gcc backend is planned. So maybe a future bootstrap environment of yours will contain 'grustc', or however it will be called.
                  That would be great. Who knows, only time will show what tools are usable and popular. Once again: not against any tools or new languages. But that attitude of pushing and forcing "the new best thing in the whole world" just repels people from using it.

                  Comment


                  • #59
                    Originally posted by hotaru View Post
                    multithreaded bzip2 decompresses a lot faster.
                    Care to show proofs?

                    On my 4 cores PC LZMA2 decompresses up to 10 times faster than pbzip2.

                    Comment


                    • #60
                      Originally posted by birdie View Post

                      Care to show proofs?

                      On my 4 cores PC LZMA2 decompresses up to 10 times faster than pbzip2.
                      no, I don't really care enough to bother running the benchmarks again, but on my 16 and 32 core Xeon machines, multithreaded bzip2 (with lbzip2, I haven't tested pbzip2) is much faster than LZ4, which is itself much faster than LZMA2.

                      Comment

                      Working...
                      X