Announcement

Collapse
No announcement yet.

Mesa Turns To BLAKE3 For Faster Vulkan Shader Hashing

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ryao
    replied
    Originally posted by evil_core View Post

    But how does Blake3 compares to Fletcher4 in context of ZFS?

    I'm creating new pool and wonder if should I choose Fletcher4 or Blake3?

    As I understand Blake3 should allow me for background dedup, like BTRFS does.
    All hardware supports SIMD2, so I understand that's requirement for h/w accel in ZFS.

    Unfortunately I was unable to find any articles/benchmarks or even commments about Blake3 vs Fletcher4 performance in context of ZFS.
    It has about 10x less throughput than fletcher4. Use /proc/spl/kstat/zfs/fletcher_4_bench and /proc/spl/kstat/zfs/chksum_bench to compare. The former uses bytes per second. The latter uses binary megabytes per second.
    Last edited by ryao; 22 December 2023, 03:24 PM.

    Leave a comment:


  • evil_core
    replied
    Originally posted by ryao View Post

    Comparable to xxhash. Blake3 does not run at memory speed.
    But how does Blake3 compares to Fletcher4 in context of ZFS?

    I'm creating new pool and wonder if should I choose Fletcher4 or Blake3?

    As I understand Blake3 should allow me for background dedup, like BTRFS does.
    All hardware supports SIMD2, so I understand that's requirement for h/w accel in ZFS.

    Unfortunately I was unable to find any articles/benchmarks or even commments about Blake3 vs Fletcher4 performance in context of ZFS.

    Leave a comment:


  • geearf
    replied
    Originally posted by ryao View Post

    Comparable to xxhash. Blake3 does not run at memory speed.
    Got it, thank you!

    Leave a comment:


  • ryao
    replied
    Originally posted by geearf View Post

    Hello,

    comparable to which? Blake3?

    Thank you!
    Comparable to xxhash. Blake3 does not run at memory speed.

    Leave a comment:


  • geearf
    replied
    Originally posted by ryao View Post

    ZFS added Blake3 as a checksum option. xxhash is not usable for on-disk checksums in ZFS since ZFS uses 256-bit checksums while xxhash is at most 128-bit. The Fletcher4 checksum algorithm in ZFS generates 256-bit hashes and has comparable (memory speed) performance.
    Hello,

    comparable to which? Blake3?

    Thank you!

    Leave a comment:


  • oconnor663
    replied
    Since blake3 is 6 times faster than blake2b
    The performance comparison between BLAKE3 and BLAKE2b (really between any two hash functions) depends completely on which architecture you measure them on. There's no simple rule of thumb. In fact, if you're on a CPU with say 16 cores, there's going to be an order of magnitude difference just between single-threaded BLAKE3 and multithreaded BLAKE3!

    When we publish benchmarks, we try to be very careful to specify exactly what machine they ran on, and whether BLAKE3 was using one thread or multiple threads. But it's understandable that as these things get filtered through articles and forum posts, some of the finer details get lost.

    Leave a comment:


  • AndyChow
    replied
    Originally posted by markg85 View Post

    Stop saying AES-NI. It has nothing to do with SHA.
    AES-NI is for optimized AES encryption.
    https://en.m.wikipedia.org/wiki/Intel_SHA_extensions is for optimized SHA hashing.

    Therefore what you claim is like comparing an apple (the fruit) with a pinetree. It just doesn't make sense, you're comparing 2 completely different things.

    As for benchmarks. I don't have fancy graphs or numbers as this is from memory and from about a year ago. On the raspberry pi (I tried V4) blake3 was stupidly fast for calculating a checksum for a file. Like ~3x faster. You can try that yourself with the "b3sum" tool.

    On desktop hardware the difference wasn't that extreme but still like 1.5x.

    Which, in my book, means it's just a vastly superior hashing algorithm compared to sha (I used sha256sum). If you want benchmarks, run them yourself. You likely already have sha256sum on your PC, just install b3sum and go compare till your hearts content 😉
    I couldn't benchmark at the time for reasons beyond my control. Someone else provided a bench showing on an X1 laptop sha256 was about twice as fast as blake2b. Since blake3 is 6 times faster than blake2b. I'd expect it to be 3 times faster than sha256. It's not a trivial question really. I would use xxhash over crc for non-suspect stuff, since it's faster, but the acceleration makes the reverse true in practice. That's why I'm often skeptical at "faster algo" claims.

    Newer ARM chips tend to have those crypto hashing accelerations. RK3588 for example supposedly has such accelerators.

    As for which branding Intel uses for it's instructions, sorry if I don't remember which group provides which instructions. AES-NI made sense. That's the header under which I would group the hashing stuff too.

    Leave a comment:


  • markg85
    replied
    @pabs
    Thank you for that reply! That's much appreciated

    I am running on Arch linux so i'm going with your assumption that i'm on the fast SHA path in both sha256sum and openssl. That seems to match my expectations too.

    Leave a comment:


  • ryao
    replied
    Originally posted by geearf View Post

    I was actually wondering why they didn't pick xx so thanks for that!
    In that case do you think it would be bad to use it for an FS checksum?
    ZFS added Blake3 as a checksum option. xxhash is not usable for on-disk checksums in ZFS since ZFS uses 256-bit checksums while xxhash is at most 128-bit. The Fletcher4 checksum algorithm in ZFS generates 256-bit hashes and has comparable (memory speed) performance.

    Leave a comment:


  • smitty3268
    replied


    My understanding is that blake3 is faster than SHA1 even in cases where it's limited to single-threading and using the hardware acceleration. I believe the link above is testing those conditions.

    The top machine listed is Alder Lake - older cpus will presumably make the difference much larger, but Alder Lake still has it almost twice as fast on >4K sizes. It gets slower on small sized inputs, which means it may make less sense for something like a hash table. But shader text is probably going to tend to be larger. Zen 3 results there look similar.

    I've heard blake 3 can take better advantage of AVX512 for the same reason it's capable of being more multi-threading friendly, which helps it out on these newer machines which are the same ones that tend to have the SHA1 hardware acceleration. If you are on a cpu that limits speed when AVX instructions are used, that may be a problem that takes away some of blake 3's advantages.

    That said, I know the devs were also interested in increasing the hash length. SHA1 is only 160 bit while blake3 is 256. If you plan on changing that hash size, there's no particular reason to stick with one of the SHA variants, you might as well go for whatever will work best.

    There was also a recent merge request to start using a SHA1 implementation on ARM that was assembly optimized rather than the C version Mesa currently uses, and it was rejected because nobody wanted to have to maintain that ARM assembly code in the Mesa project. They want to use hash code that is heavily used (and therefore supported) elsewhere so it's not a burden on their project. Lots of ways that could have been resolved, obviously, but this will be a major win for current ARM support at the very least.
    Last edited by smitty3268; 24 June 2023, 05:58 PM.

    Leave a comment:

Working...
X