Announcement

Collapse
No announcement yet.

BLAKE3 Cryptographic Hashing Function Sees Experimental Vulkan Implementation

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • oconnor663
    replied
    Hello! BLAKE3 author here. A comment on the original article:

    to date it's been just implemented in Rust for the multi-threaded version and a reference C implementation
    The official repo only contains Rust and C implementations, but BLAKE3 has been implemented by other people in several other languages already. We link to 3rd party implementations in the @BLAKE3-team Twitter feed. There's a Go implementation that's almost as fast as C/Rust, and there are bindings for Python and WASM.

    Also note that the reference implementation is written in Rust, not C. The official repo contains the reference implementation in Rust, an optimized implementation in Rust (the `blake3` crate), and an optimized implementation in C.

    Regarding some benchmarks linked to earlier in this thread:

    Blake3 is just few bit faster than sha256 here
    Those figures report SHA-256 (presumably `sha256sum`, which usually links against OpenSSL) hashing 1 GB in 587ms, and `b3sum` doing the same in 461ms, on a Ryzen 5 3400G CPU. I want to give some context for those figures, based on my best guesses about how they were measured. For comparison, the i5-8250U CPU on my laptop has 4 physical / 8 logical cores, the same as the Ryzen 5 3400G. Also very importantly, both CPUs support the AVX2 instruction set, so they dispatch to the same BLAKE3 implementation.

    When I create a 1 GB file `f` on my laptop and run `time b3sum f`, I get 119ms (best of 10 runs in a loop). That's because `b3sum` memory maps the whole file and splits the work across all 4 cores. However, if I run `time b3sum < f` instead, I get 488ms, about 4x slower. That's because when reading from stdin, `b3sum` can't memory map the file, and in the current implementation only one thread gets used to hash it. I assume the reported 461ms figure comes from this method, which is to say, it's measuring single-threaded BLAKE3 on a CPU that's somewhat faster than mine. (Probably a bit faster than it appears here, if the original figure wasn't a best-of-10 measurement.)

    Now, when I run `time sha256sum f` on my laptop, it takes 2.502 seconds. This is much slower than the reported figure from the Ryzen 5 3400G. I think the reason for this difference is that the Ryzen 5 3400G supports SHA extensions, which provide hardware acceleration for SHA-256, and `sha256sum` is taking advantage of that. My CPU doesn't support SHA extensions, so I'm measuring performance in software.

    If all that's correct, I'd interpret these figures to mean that on the Ryzen 5 3400G, single-threaded BLAKE3 is slightly faster than hardware-accelerated SHA-256.

    Leave a comment:


  • Guest
    Guest replied
    Originally posted by alpha_one_x86 View Post
    https://catchchallenger.first-world....onception#Hash Blake3 is just few bit faster than sha256 here and vulkan on server is not common...
    I don't think it was meant for servers.....it could be very useful for some client devices (e.g low end computers, phones etc.)

    Leave a comment:


  • Guest
    Guest replied
    Originally posted by Steffo View Post

    Look at the github page. Blake 3 is also in a single threaded CPU case much faster than other algorithms.

    https://github.com/BLAKE3-team/BLAKE3
    Ah you meant use Blake3 instead of SHA-1 - I thought you meant using this GPU offload. What I said applied to GPU offload.

    Leave a comment:


  • HadrienG
    replied
    Originally posted by Qaridarium View Post
    but there is a CPU implementation ?

    https://www.phoronix.com/forums/foru...ed-on-llvmpipe
    As far as I know, neither Kazan nor Vallium are currently in a shape where you'd actually want to use them in production.

    Leave a comment:


  • qarium
    replied
    Originally posted by HadrienG View Post
    [*]Lack of CPU implementation and poor support for CUDA-style single-source programming means that if the bulk of your compute infrastructure is CPU-based and you have a few new GPU nodes, you need to write and maintain what is basically a copy of your codebase' core logic just to run on those occasional GPU nodes. To say that people are not thrilled about this would be an understatement.
    but there is a CPU implementation ?

    Phoronix: Mesa &quot;Vallium&quot; - Software/CPU-Based Vulkan Based On LLVMpipe While there has been the CPU-based &quot;Kazan&quot; Vulkan driver (formerly Vulkan-CPU as a Google Summer of Code project) and Google's SwiftShader has been implementing CPU-based Vulkan support, it turns out Red Hat's David Airlie has been

    Leave a comment:


  • zxy_thf
    replied
    Originally posted by Steffo View Post
    If this is so fast, this could be used for git. 🤔
    Actually I'm worrying they have to pick another hash function before finishing the migration to SHA-2.

    Leave a comment:


  • zxy_thf
    replied
    Originally posted by alpha_one_x86 View Post
    https://catchchallenger.first-world....onception#Hash Blake3 is just few bit faster than sha256 here and vulkan on server is not common...
    This testing result is clearly IO bonded.
    The input file shall be copied to tmpfs otherwise the bottleneck will be the hard drive.

    My personal test result:

    Code:
    time b3sum CentOS-7-x86_64-Minimal-1511.iso
    8850e389ad276d74215877304a5219567c4f2c25c9f518080aa1d0f183c5df10 CentOS-7-x86_64-Minimal-1511.iso
    
    real 0m0.091s
    user 0m0.219s
    sys 0m0.041s
    
    time sha256sum CentOS-7-x86_64-Minimal-1511.iso
    f90e4d28fa377669b2db16cbcb451fcb9a89d2460e3645993e30e137ac37d284 CentOS-7-x86_64-Minimal-1511.iso
    
    real 0m1.580s
    user 0m1.478s
    sys 0m0.094s
    around 7 folds speedup.

    Oh forgot my system spec: i5-4570, DDR3@1600MHz
    Last edited by zxy_thf; 29 April 2020, 10:15 AM.

    Leave a comment:


  • alpha_one_x86
    replied
    https://catchchallenger.first-world....onception#Hash Blake3 is just few bit faster than sha256 here and vulkan on server is not common...

    Leave a comment:


  • Ardje
    replied
    Off topic: I wonder when they come with Blake's 7.

    Leave a comment:


  • Steffo
    replied
    Originally posted by sandy8925 View Post

    No, only in the specific case of large files. There's some minimum latency involved in splitting the data into different pieces, sending it over to the GPU, compiling the compute kernels etc. and then getting the result back.

    It's only worth doing this for large files (say a few hundred MB or more).
    Look at the github page. Blake 3 is also in a single threaded CPU case much faster than other algorithms.

    Leave a comment:

Working...
X