Announcement

Collapse
No announcement yet.

Linux 5.3 Crypto Updates Jitter RNG, Adds xxHash

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linux 5.3 Crypto Updates Jitter RNG, Adds xxHash

    Phoronix: Linux 5.3 Crypto Updates Jitter RNG, Adds xxHash

    Herbert Xu sent out the crypto subsystem updates on Monday for the in-development Linux 5.3 kernel...

    http://www.phoronix.com/scan.php?pag...nux-5.3-Crypto

  • #2
    For the record:

    xxHash (exists in both 32 and 64 bits variants) has been written by Yann Collet, the same genius behind LZ4 and Zstd compression algorithms.

    And it's blazing fast (CYann came with it because at some point in the development, LZ4 had become so fast, that most of the time was lost in computing checksum for checking the outputs, not the decompression. So he needed some fast algorithm in LZ4 containers, and he came up with that).

    It also has good quality when subjected to test suites (small error generate completely different hash, no easy collision, etc.)

    If BTRFS picks it up, it means that this could be tremendously helping the speed of checksum checks, helping aleviate some of the slowdown which are inherent to this type of modern file system.

    Being in kernel, in the crypto framework means it can also be picked up by others: BCacheFS has a plug-in able architecture (including for checksums), it too uses checksums for everything including data like BTRFS, and it too could benefit from a faster algorithm.

    Other user of checksum could benefits from the use (building bloom filter, etc.)

    Comment


    • #3
      This seems to be a nice improvement for x86 like systems, and maybe others that doesn't have CRC32C crypto extensions..

      The majority of ARMv8-A hardware out there have CRC32C hardware extension...
      So it could be nice to compare xxHASH with CRC32C on ARMv8-A, exactly because it has CRC32C extension on it, and maybe get rid of it from ARMv8-A, will slow it down..

      The unique test I saw, was in the page project, but on a x86 processor, not on other archs that implement CRC32C in hardware..
      Last edited by tuxd3v; 07-10-2019, 11:34 AM.

      Comment


      • #4
        Originally posted by tuxd3v View Post
        So it could be nice to compare xxHASH with CRC32C on ARMv8-A, exactly because it has CRC32C extension on it, and maybe get rid of it from ARMv8-A, will slow it down..
        Do you have some code samples ?
        I've got a few raspberry pis around and /proc/cpuinfo reports features: crc32.

        (Oh, and speaking of hardware extensions : CYann has put his crazy genius at it again and is developping XXH3 which is vectorisable (or, in your case, NEON-isable). The algorithm is still in process of tuning/tweaking, etc.).

        Comment


        • #5
          Originally posted by DrYak View Post
          Do you have some code samples ?
          I've got a few raspberry pis around and /proc/cpuinfo reports features: crc32.
          That code is for a while in the Linux Kernel..
          It was submitted in 'Nov. 19, 2014'

          See this
          This module registers a crc32 algorithm and a crc32c algorithm that use the optional CRC32 and CRC32C instructions in ARMv8. Tested on AMD Seattle. Improvement compared to crc32c-generic algorithm: TCRYPT CRC32C speed test shows ~450% speedup. Simple dd write tests to btrfs filesystem show ~30% speedup.
          It would be nice to compare
          Last edited by tuxd3v; 07-10-2019, 08:12 PM.

          Comment


          • #6
            Originally posted by tuxd3v View Post
            This seems to be a nice improvement for x86 like systems, and maybe others that doesn't have CRC32C crypto extensions..
            x86 hardware with SSE4.2 and later (hardware from 10 years ago or newer) has an instruction for CRC32C
            see here for code samples for it
            https://stackoverflow.com/questions/...c-instructions
            https://stackoverflow.com/questions/...tware/17646775
            Last edited by starshipeleven; 07-11-2019, 07:35 AM.

            Comment


            • #7
              Originally posted by starshipeleven View Post
              x86 hardware with SSE4.2 and later (hardware from 10 years ago or newer) has an instruction for CRC32C
              see here for code samples for it
              Unfortunately, 'amd64', only has a pre-requisite which is SSE2

              Til date, 'amd64' doesn't have crc32( IEEE implementation ) or crc32c( Castagnoli implementation ) implemented in hardware( ARMv8 has Hardware implementations of them..BUT its a Optional Feature.. ),
              Although exists versions that implement crc32 and crc32c using SIMD instructions..

              For instance, for what I read yesterday, in the valuable post of @DrYak , here ,
              A implementation of crc32 and crc32c, using SSE2, should bump the speed of crc32 calculations by 14x( read the comments below on that link.. )

              We are talking about 1400% performance, with SSE2 on x86_64..
              'Nigel Tao'
              Your chart shows that the xxh family has much greater (10x or more) throughput than crc32. Which crc32 implementation are you using? SIMD implementations can be around 14x faster than the crc32 implementation in zlib (the C library).
              Long Short Short,
              zlib compiled for amd64, should had already been implemented using SSE2, because of HUGE gains( 14x ) in speed that could be achieved..

              Since the pre-requisite for 'amd64', is SSE2, it is present in any 'amd64'( x86_64 )...
              Its a pity situation, that zlib its *not* Optimized for x86_64..

              That I know, only ARM has Hardware crc32 and crc32c( Castagnoli version ), implemented in Hardware..

              I will provide a sample code( its not ready, and was based in AMD Seattle implementation ), because I also have interest in comparing both on ARMv8.
              Cpu usage versus Power Consumption, in huge sets of data..
              Last edited by tuxd3v; 07-11-2019, 10:00 AM.

              Comment


              • #8
                Originally posted by tuxd3v View Post
                Unfortunately, 'amd64', only has a pre-requisite which is SSE2
                amd64 is the arch name for all x86 64bit processors.

                AMD processors supporting SSE4.2 (thus the CRC instruction) start from Bulldozer (2011) onwards, just a year later after Intel.
                http://www.cpu-world.com/Glossary/S/SSE4.2.html
                https://community.amd.com/thread/208670

                That I know, only ARM has Hardware crc32 and crc32c( Castagnoli version ), implemented in Hardware..
                Any amd64 processor supporting SSE4.2 will have hardware acceleration for crc32c. http://lkml.iu.edu/hypermail/linux/k...08.0/1217.html

                Comment


                • #9
                  Originally posted by starshipeleven View Post
                  amd64 is the arch name for all x86 64bit processors.

                  AMD processors supporting SSE4.2 (thus the CRC instruction) start from Bulldozer (2011) onwards, just a year later after Intel.
                  I don't knew SSE4.2 provided them, but that is only on some 'amd64' cpus( the previous ones doesn't have it. I believe ?!)
                  2 application-targeted accelerator (ATA) instructions:
                  • CRC32 - calculates cyclic redundancy check of a block of data
                  • POPCNT - improves searching of bit patterns
                  Also I don't know how powerfull SSE4.2 CRC32 is..
                  for the link provided, SSE2 should bump speeds like crazy, and reduce CPU usage( SSE4.2 is a lot newer than amd64 arch, and because of that I understand that zlib doesn't implement it has default, because cpus older than SSE4.2 will be out of it..but a SSE2 implementation is the minimum default, should be a requirement.. )

                  Originally posted by starshipeleven View Post
                  Any amd64 processor supporting SSE4.2 will have hardware acceleration for crc32c. http://lkml.iu.edu/hypermail/linux/k...08.0/1217.html
                  It seems its already in the kernel crypto facility, but for what was discussed above its not delivered in zlib,
                  And I believe zlib should have the minimum set requirements for 'amd64', which is a SSE2 implementation,
                  That is what seems to be lacking there..

                  The tests Y.C have done, *I believe*, was comparing his algo( xxHash ) with zlib( that doesn't have even SSE2 optimisations.. )

                  Comment


                  • #10
                    Originally posted by tuxd3v View Post
                    I don't knew SSE4.2 provided them, but that is only on some 'amd64' cpus( the previous ones doesn't have it. I believe ?!)
                    If it does not support SSE4.2 it does not have it.

                    Also I don't know how powerfull SSE4.2 CRC32 is..
                    I provided code examples above, so you can (should) make benchmarks if you want. I don't know either, I just know that this instruction exists and is used.

                    I understand that zlib doesn't implement it has default,
                    What's wrong with having different code paths for different hardware?

                    And I believe zlib should have the minimum set requirements for 'amd64', which is a SSE2 implementation,
                    It would be a different code path anyway

                    Comment

                    Working...
                    X