Announcement

Collapse
No announcement yet.

Vulkan 1.3.300 Delivers New Cooperative Matrix Extension From NVIDIA

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Developer12
    replied
    Originally posted by coder View Post
    Thanks for the details. I don't have any recent experience with phone development, so I am indeed behind on that front.

    However, you're the one who even raised red herring of phone SoCs, in relation to a patch that clearly has nothing to do with them.

    Finally, phone SoCs spend valuable silicon realestate on NPUs for good reasons. They're much more power-efficient, if not also faster, than using the embedded GPU to do the same thing. If developers are foregoing the NPUs, as you suggest, I think it's a shame for users.
    Phones are the whole damn reason these matrix extensions are being proposed. Nobody does AI compute on vulkan except in a mobile context.

    Phones _sometimes_ get NPUs, but they're never consistent across SoCs and each and every single one requires a massive porting effort to implement code lowering. SoC vendors put them into chips in an effort to *appear* competitive, but no software developer likes them, wants them, or leverages them. It is just stupid.

    Leave a comment:


  • coder
    replied
    Originally posted by Developer12 View Post
    Go do some actual research before you spout worthless what-ifs.
    Thanks for the details. I don't have any recent experience with phone development, so I am indeed behind on that front.

    However, you're the one who even raised red herring of phone SoCs, in relation to a patch that clearly has nothing to do with them.

    Finally, phone SoCs spend valuable silicon realestate on NPUs for good reasons. They're much more power-efficient, if not also faster, than using the embedded GPU to do the same thing. If developers are foregoing the NPUs, as you suggest, I think it's a shame for users.

    Leave a comment:


  • Developer12
    replied
    Originally posted by coder View Post
    Yes, we're agreed; Vulkan on NPUs isn't a "thing".


    This bundles a lot of assumptions. You're assuming PyTorch's Vulkan backend is as functional as its other backends, that most phone/tablet GPUs even support the necessary Vulkan extensions to run it, and that there are even any phone SoCs with NPUs that don't support NNAPI.


    Vulkan support is not a binary thing. It's a complete mess of extensions and versions, so your device would need the right version of Vulkan plus all of the right extensions needed by the AI framework & model in question. It shouldn't surprise you to hear that support for newer Vulkan versions and many of the extensions is quite spotty, among Android devices.
    The open source drivers used by android support the necessary extensions, eg. Turnip. No exotic or unusual extensions are required. This backend is intended to be widely supported. That's the whole point of choosing vulkan in the first place, over something like OpenCL or SyCL or any of a rainbow of weakly-supported NPU APIs.

    As for the ops supported by pytorch when running on vulkan, it seems churn in the repo has broken the hardcoded links to the op partitioning code file and the list of operators at https://pytorch.org/tutorials/protot..._workflow.html seems to be stale, so here's a tutorial on getting the LLaMA 3.2-1B LLM running on the vulkan backend: https://pytorch.org/executorch/stabl...un-vulkan.html There are a few ops not supported in vulkan yet (and tbh, even the cuda implementation in llama.cpp still encodes prompts on the CPU), but it supports running all of the important stuff on the GPU including the binary attention layers, convolution layers, and linear layers. In other words, even for the most cutting edge models they've gone to the trouble of implementing important stuff like all the special kinds of attention layer for an LLM.

    Go do some actual research before you spout worthless what-ifs.

    Leave a comment:


  • coder
    replied
    Originally posted by Developer12 View Post
    Nobody is running Vulkan on NPUs. It's so you can run models on the GPU.
    Yes, we're agreed; Vulkan on NPUs isn't a "thing".

    Originally posted by Developer12 View Post
    ​This is why pytorch has supported vulkan for model inference for a quite a while.

    In case you didn't notice, far more phones have GPUs that support vulkan than properly support nnapi.
    This bundles a lot of assumptions. You're assuming PyTorch's Vulkan backend is as functional as its other backends, that most phone/tablet GPUs even support the necessary Vulkan extensions to run it, and that there are even any phone SoCs with NPUs that don't support NNAPI.

    Originally posted by Developer12 View Post
    ​​And now nnapi is dead and google is trying to push some other braindead thing. Vulkan is the universal constant. You don't need google or android support to leverage it. You can just submit work to it and go.
    Vulkan support is not a binary thing. It's a complete mess of extensions and versions, so your device would need the right version of Vulkan plus all of the right extensions needed by the AI framework & model in question. It shouldn't surprise you to hear that support for newer Vulkan versions and many of the extensions is quite spotty, among Android devices.

    Leave a comment:


  • Developer12
    replied
    Originally posted by coder View Post
    WTF?

    Android has provided NNAPI way back in Android 8.1. Vulkan wouldn't even work on the NPUs in most of the phone SoCs out there.

    Some AI frameworks support it, too:

    That said, it's been deprecated since Android 15. Google now wants everybody to use LiteRT (formerly TensorFlow Lite):
    Nobody is running Vulkan on NPUs. It's so you can run models on the GPU. This is why pytorch has supported vulkan for model inference for a quite a while.

    In case you didn't notice, far more phones have GPUs that support vulkan than properly support nnapi. And now nnapi is dead and google is trying to push some other braindead thing. Vulkan is the universal constant. You don't need google or android support to leverage it. You can just submit work to it and go.

    Leave a comment:


  • coder
    replied
    Originally posted by Developer12 View Post
    AI adoption on mobile is being hindered by the lack of a viable API. Nvidia and their closed driver BS has meant that they haven't been any more successful than intel at selling SoCs into mobile phones, so no CUDA. Thus, any AI inference has had to go through vulkan on mobile platforms, which remains a bit difficult.
    WTF?

    Android has provided NNAPI way back in Android 8.1. Vulkan wouldn't even work on the NPUs in most of the phone SoCs out there.

    This is a description of NNAPI (Neural Networks API), a low-level API for using NPUs on Android to enable fast inference of AI models.


    Some AI frameworks support it, too:

    NNAPI allows Android apps to run computationally intensive neural networks on the parts of the chips that power mobile phones.


    That said, it's been deprecated since Android 15. Google now wants everybody to use LiteRT (formerly TensorFlow Lite):

    Leave a comment:


  • Developer12
    replied
    Originally posted by hauberg View Post
    It really would be a big deal if Vulkan could become a viable machine learning backend. I'd be somewhat surprised if NVIDIA pushed for this, though. NVIDIA is king of the hill because of CUDA so it seems surprising that they would push for an alternative.
    AI adoption on mobile is being hindered by the lack of a viable API. Nvidia and their closed driver BS has meant that they haven't been any more successful than intel at selling SoCs into mobile phones, so no CUDA. Thus, any AI inference has had to go through vulkan on mobile platforms, which remains a bit difficult. Nvidia are probably thinking that any increase in AI inference on smartphones using vulkan will result in an increase in GPU sales to train the models being run, so they're proposing extensions to try to make it easier and hoping they're adopted.

    Really though, if nvidia wants these extensions to be adopted the proposed standards need to be accompanied by an implementation in MESA, not just nvidia's desktop GPU drivers.

    Leave a comment:


  • Ladis
    replied
    Originally posted by nlgranger View Post

    As far as I know, Vulkan buffers are still limited to 4G because an important field was mistakenly set to int32 instead of int64: https://github.com/KhronosGroup/Vulkan-Docs/issues/1016 so ML applications will be limited.
    Funny how they couldn't fix it in 5 years, when we lived through crypto and now AI.

    Leave a comment:


  • nlgranger
    replied
    It really would be a big deal if Vulkan could become a viable machine learning backend.
    As far as I know, Vulkan buffers are still limited to 4G because an important field was mistakenly set to int32 instead of int64: https://github.com/KhronosGroup/Vulkan-Docs/issues/1016 so ML applications will be limited.

    Leave a comment:


  • coder
    replied
    Originally posted by hauberg View Post
    I'd be somewhat surprised if NVIDIA pushed for this, though. NVIDIA is king of the hill because of CUDA so it seems surprising that they would push for an alternative.
    I think it's telling how this follows a more generic cooperative matrix extension. I'm guessing that version didn't highlight the performance advantages of Nvidia hardware, which prompted them to make sure that anyone who wants to do this stuff can achieve better performance on Nvidia GPUs than their rivals.

    Leave a comment:

Working...
X