Announcement

Collapse
No announcement yet.

Mesa Turns To BLAKE3 For Faster Vulkan Shader Hashing

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by markg85 View Post

    It's simple really.
    blake3 uses another algorithm that is just much more efficient for the CPU and thus beats the SHA-familly even with their CPU instruction optimized versions.
    The benchmark numbers above are an apples-to-oranges comparison between a SIMD-optimized BLAKE3 implementation and a mostly unoptimized and unaccelerated SHA-1 implementation.

    Specifically, he BLAKE3 code imported in the linked merge request is SIMD-optimized assembly, while the Mesa SHA-1 implementation is not using the Intel SHA Extensions (SHA-1 and SHA-256 acceleration) or the ARMv8 sha1 extension (SHA-1 acceleration). The Mesa SHA-1 implementation written in C and is loop-unrolled but otherwise unoptimized. Compare that to the OpenSSL x86-64 SHA-1 implementation and the ARMv8 SHA-1 implementation. Both OpenSSL SHA-1 implementations use the accelerated SHA-1 instructions and are optimized in several other ways (store block data in registers, minimize L1 cache use, etc).

    (SHA-1 is broken and shouldn't be used as a cryptographic hash, so I'm not recommending it; I'm just pointing out that the benchmark is nonsense).

    Comment


    • #12
      Originally posted by AndyChow View Post

      Sha256 is the first that comes to mind because it's accelerated by virtually every cpu out there. Not sure what the logic here is. Blake3 is faster when there are no AES instructions, but that never happens. Even on ARM chip they have AES accel at this point (well, most do).

      Shader hashing happens on the cpu, right? I don't get the logic here at all.
      The Cortex-A72 in the Raspberry Pi 4B does not have the ARMv8 aes, sha1, sha2, sha512, or sha3 crypto extensions.

      Example (cherry is a Raspberry Pi 4B, and pizza is an Odroid N2L):

      Code:
      > for i in cherry pizza; do echo "$i:"; ssh $i lscpu | grep -i 'model name\|flags' | sed 's/.*: *//'; done
      cherry:
      Cortex-A72
      fp asimd evtstrm crc32 cpuid
      pizza:
      Cortex-A53
      fp asimd evtstrm aes pmull sha1 sha2 crc32
      Cortex-A73
      fp asimd evtstrm aes pmull sha1 sha2 crc32
      ​

      Comment


      • #13
        Using blake - Cloudnt this also improve dxvk and vkd3d-proton. Dxvk uses sha1 for hashing and vkd3d-proton md5 (AFAIK)

        Comment


        • #14
          Originally posted by CochainComplex View Post
          Using blake - Cloudnt this also improve dxvk and vkd3d-proton. Dxvk uses sha1 for hashing and vkd3d-proton md5 (AFAIK)
          MD5 shouldn't be used for anything at this point; it is cryptographically broken and is slower than SHA-256 on CPUs with accelerated SHA-256 and typically slower than BLAKE2b on CPUs without accelerated SHA-256.

          (OpenSSL doesn't include BLAKE3 yet, that's why I didn't include it in the charts).

          Comment


          • #15
            Originally posted by markg85 View Post

            It's simple really.
            blake3 uses another algorithm that is just much more efficient for the CPU and thus beats the SHA-familly even with their CPU instruction optimized versions.
            You have a benchmark on that? I've heard the same for Blake2b, but in my direct test with AES-NI it absolutely wasn't the case.

            Comment


            • #16
              Originally posted by pabs View Post

              Comment


              • #17
                Originally posted by AndyChow View Post

                You have a benchmark on that? I've heard the same for Blake2b, but in my direct test with AES-NI it absolutely wasn't the case.
                On Intel CPUs the SHA extension provides accelerated SHA-1 and SHA-256 instructions, not AES-NI.

                I benchmarked the OpenSSL 3.0.3 implementations of BLAKE2s, BLAKE2b, SHA-1, SHA-2, and SHA-3 on my laptop and a Raspberry Pi 4B a year ago.

                On Intel CPUs the SHA extensions provide accelerated SHA-1 and SHA-256 (but not SHA-512) instructions. On ARM CPUs without the sha1 or sha2 crypto extensions (e.g., Raspberry Pi 4Bs) BLAKE2b is faster than SHA-1 and SHA-256, and BLAKE2s is faster than SHA-256 but slower than SHA-512.

                (Ignoring accelerated instructions, BLAKE2b and SHA-512 generally perform better than BLAKE2s and SHA-256 on 64-bit CPUs).

                Comment


                • #18
                  Interesting. What exactly is Mesa hashing, compiled binary code of the shader?

                  Comment


                  • #19
                    Originally posted by AndyChow View Post

                    You have a benchmark on that? I've heard the same for Blake2b, but in my direct test with AES-NI it absolutely wasn't the case.
                    Stop saying AES-NI. It has nothing to do with SHA.
                    AES-NI is for optimized AES encryption.
                    https://en.m.wikipedia.org/wiki/Intel_SHA_extensions is for optimized SHA hashing.

                    Therefore what you claim is like comparing an apple (the fruit) with a pinetree. It just doesn't make sense, you're comparing 2 completely different things.

                    As for benchmarks. I don't have fancy graphs or numbers as this is from memory and from about a year ago. On the raspberry pi (I tried V4) blake3 was stupidly fast for calculating a checksum for a file. Like ~3x faster. You can try that yourself with the "b3sum" tool.

                    On desktop hardware the difference wasn't that extreme but still like 1.5x.

                    Which, in my book, means it's just a vastly superior hashing algorithm compared to sha (I used sha256sum). If you want benchmarks, run them yourself. You likely already have sha256sum on your PC, just install b3sum and go compare till your hearts content 😉

                    Comment


                    • #20
                      Originally posted by AndyChow View Post

                      Sha256 is the first that comes to mind because it's accelerated by virtually every cpu out there. Not sure what the logic here is. Blake3 is faster when there are no AES instructions, but that never happens. Even on ARM chip they have AES accel at this point (well, most do).

                      Shader hashing happens on the cpu, right? I don't get the logic here at all.
                      To my knowledge, there is no hardware instruction for any of the SHA family on x86-64 cpus.
                      There is AES-NI, but as the name implies, it is for AES, not SHA256 or something else.

                      Edit: I stand corrected, it turns out that there *is* : https://en.m.wikipedia.org/wiki/Intel_SHA_extensions
                      Last edited by aviallon; 23 June 2023, 05:33 PM.

                      Comment

                      Working...
                      X