Announcement

Collapse
No announcement yet.

BLAKE3 Cryptographic Hashing Function Sees Experimental Vulkan Implementation

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BLAKE3 Cryptographic Hashing Function Sees Experimental Vulkan Implementation

    Phoronix: BLAKE3 Cryptographic Hashing Function Sees Experimental Vulkan Implementation

    BLAKE3, the cryptographic hash function that advertises itself as being "much faster" than the likes of SHA1 and MD5 and its predecessor BLAKE2 while being more secure and highly parallelizable has seen an experimental implementation for GPU-based acceleration using the Vulkan API...

    http://www.phoronix.com/scan.php?pag...imental-Vulkan

  • #2
    they should really make Vulkan the standard API for Compute /General Purpose Computation on Graphics Processing Unit GPGPU

    but according to Bridgman : "Just to be clear, we are not thinking that OpenCL is the future of datacenter computing. Our investment is going into HIP and higher level libraries/frameworks. Vulkan is even less well suited than OpenCL, except for consumer applications where Vulkan's pervasiveness (is that a word ?) is more important than its limited computing capabilities." https://www.phoronix.com/forums/foru...e4#post1175456

    other people get super fast results with ACO+Vulkan: https://translate.google.com/transla...edup-2x-aco%2F

    i really do not unterstand what is the problem with Vulkan for Compute... all test results show it is super fast.
    Phantom circuit Sequence Reducer Dyslexia

    Comment


    • #3
      Qaridarium I generally agree that Vulkan seems like the most promising API right now, if you need portability between GPU vendors. But you do lose a number of important things with respect to OpenCL.

      I want to eventually do a more in-depth comparison, but right now, if I just do a quick comparison from a quick check of the two specs and what I learned from writing stuff using both...
      • Vulkan is even more verbose/manual than OpenCL, and at least for my field of scientific computing (where the bulk of the workforce is made of research staff with only little training in software development to begin with) you're really going to want a higher-level layer. In particular, the Vulkan command synchronization model is basically unusable by non-experts.
      • Partly as a result of this, and partly due to graphic features, the Vulkan spec is much larger and harder to navigate, especially for non-experts. For example, the fact that it is possible to map a device-side buffer to host virtual memory is not even visible in the table of contents, you need to dig deep to figure this out.
      • Lack of CPU implementation and poor support for CUDA-style single-source programming means that if the bulk of your compute infrastructure is CPU-based and you have a few new GPU nodes, you need to write and maintain what is basically a copy of your codebase' core logic just to run on those occasional GPU nodes. To say that people are not thrilled about this would be an understatement.
      • OpenCL is quite IEEE-754 compliant by default, and allows you to optionally use more relaxed semantics with special functions or compiler flags, whereas with Vulkan compute shaders you're getting something more fast-math-ish by default and need to remember to add something like a "highp" qualifier for maximal floating-point reproducibility. For codebases that need FP reproducibility between the CPU and GPU side (which is convenient for testing for example), this is pretty annoying.
      • The device-side synchronization model of Vulkan is a bit less convenient than that of OpenCL in some respects, for example you cannot synchronize global memory accesses as easily as shared memory ones (need an extra fence) and that complicates some algorithms which try to reduce the number of costly kernel invocations during reductions via lock-free techniques.
      • There is no device-side printf in Vulkan, and that hurts debugging agility quite a bit in practice.
      Last edited by HadrienG; 04-29-2020, 02:18 AM.

      Comment


      • #4
        If this is so fast, this could be used for git. 🤔

        Comment


        • #5
          Originally posted by Steffo View Post
          If this is so fast, this could be used for git. 🤔
          No, only in the specific case of large files. There's some minimum latency involved in splitting the data into different pieces, sending it over to the GPU, compiling the compute kernels etc. and then getting the result back.

          It's only worth doing this for large files (say a few hundred MB or more).

          Comment


          • #6
            Originally posted by sandy8925 View Post

            No, only in the specific case of large files. There's some minimum latency involved in splitting the data into different pieces, sending it over to the GPU, compiling the compute kernels etc. and then getting the result back.

            It's only worth doing this for large files (say a few hundred MB or more).
            Look at the github page. Blake 3 is also in a single threaded CPU case much faster than other algorithms.

            https://github.com/BLAKE3-team/BLAKE3

            Comment


            • #7
              Off topic: I wonder when they come with Blake's 7.

              Comment


              • #8
                https://catchchallenger.first-world....onception#Hash Blake3 is just few bit faster than sha256 here and vulkan on server is not common...
                Developer of Ultracopier/Supercopier and of the game CatchChallenger

                Comment


                • #9
                  Originally posted by alpha_one_x86 View Post
                  https://catchchallenger.first-world....onception#Hash Blake3 is just few bit faster than sha256 here and vulkan on server is not common...
                  This testing result is clearly IO bonded.
                  The input file shall be copied to tmpfs otherwise the bottleneck will be the hard drive.

                  My personal test result:

                  Code:
                  time b3sum CentOS-7-x86_64-Minimal-1511.iso
                  8850e389ad276d74215877304a5219567c4f2c25c9f518080aa1d0f183c5df10 CentOS-7-x86_64-Minimal-1511.iso
                  
                  real 0m0.091s
                  user 0m0.219s
                  sys 0m0.041s
                  
                  time sha256sum CentOS-7-x86_64-Minimal-1511.iso
                  f90e4d28fa377669b2db16cbcb451fcb9a89d2460e3645993e30e137ac37d284 CentOS-7-x86_64-Minimal-1511.iso
                  
                  real 0m1.580s
                  user 0m1.478s
                  sys 0m0.094s
                  around 7 folds speedup.

                  Oh forgot my system spec: i5-4570, [email protected]
                  Last edited by zxy_thf; 04-29-2020, 10:15 AM.

                  Comment


                  • #10
                    Originally posted by Steffo View Post
                    If this is so fast, this could be used for git. 🤔
                    Actually I'm worrying they have to pick another hash function before finishing the migration to SHA-2.

                    Comment

                    Working...
                    X