Announcement

Collapse
No announcement yet.

Vulkan 1.3.300 Delivers New Cooperative Matrix Extension From NVIDIA

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Vulkan 1.3.300 Delivers New Cooperative Matrix Extension From NVIDIA

    Phoronix: Vulkan 1.3.300 Delivers New Cooperative Matrix Extension From NVIDIA

    Vulkan 1.3.300 debuted on Friday with a handful of fixes and one new extension...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    It really would be a big deal if Vulkan could become a viable machine learning backend. I'd be somewhat surprised if NVIDIA pushed for this, though. NVIDIA is king of the hill because of CUDA so it seems surprising that they would push for an alternative.

    Comment


    • #3
      Originally posted by hauberg View Post
      I'd be somewhat surprised if NVIDIA pushed for this, though. NVIDIA is king of the hill because of CUDA so it seems surprising that they would push for an alternative.
      I think it's telling how this follows a more generic cooperative matrix extension. I'm guessing that version didn't highlight the performance advantages of Nvidia hardware, which prompted them to make sure that anyone who wants to do this stuff can achieve better performance on Nvidia GPUs than their rivals.

      Comment


      • #4
        It really would be a big deal if Vulkan could become a viable machine learning backend.
        As far as I know, Vulkan buffers are still limited to 4G because an important field was mistakenly set to int32 instead of int64: https://github.com/KhronosGroup/Vulkan-Docs/issues/1016 so ML applications will be limited.

        Comment


        • #5
          Originally posted by nlgranger View Post

          As far as I know, Vulkan buffers are still limited to 4G because an important field was mistakenly set to int32 instead of int64: https://github.com/KhronosGroup/Vulkan-Docs/issues/1016 so ML applications will be limited.
          Funny how they couldn't fix it in 5 years, when we lived through crypto and now AI.

          Comment


          • #6
            Originally posted by hauberg View Post
            It really would be a big deal if Vulkan could become a viable machine learning backend. I'd be somewhat surprised if NVIDIA pushed for this, though. NVIDIA is king of the hill because of CUDA so it seems surprising that they would push for an alternative.
            AI adoption on mobile is being hindered by the lack of a viable API. Nvidia and their closed driver BS has meant that they haven't been any more successful than intel at selling SoCs into mobile phones, so no CUDA. Thus, any AI inference has had to go through vulkan on mobile platforms, which remains a bit difficult. Nvidia are probably thinking that any increase in AI inference on smartphones using vulkan will result in an increase in GPU sales to train the models being run, so they're proposing extensions to try to make it easier and hoping they're adopted.

            Really though, if nvidia wants these extensions to be adopted the proposed standards need to be accompanied by an implementation in MESA, not just nvidia's desktop GPU drivers.

            Comment


            • #7
              Originally posted by Developer12 View Post
              AI adoption on mobile is being hindered by the lack of a viable API. Nvidia and their closed driver BS has meant that they haven't been any more successful than intel at selling SoCs into mobile phones, so no CUDA. Thus, any AI inference has had to go through vulkan on mobile platforms, which remains a bit difficult.
              WTF?

              Android has provided NNAPI way back in Android 8.1. Vulkan wouldn't even work on the NPUs in most of the phone SoCs out there.

              This is a description of NNAPI (Neural Networks API), a low-level API for using NPUs on Android to enable fast inference of AI models.


              Some AI frameworks support it, too:

              NNAPI allows Android apps to run computationally intensive neural networks on the parts of the chips that power mobile phones.


              That said, it's been deprecated since Android 15. Google now wants everybody to use LiteRT (formerly TensorFlow Lite):

              Comment


              • #8
                Originally posted by coder View Post
                WTF?

                Android has provided NNAPI way back in Android 8.1. Vulkan wouldn't even work on the NPUs in most of the phone SoCs out there.

                Some AI frameworks support it, too:

                That said, it's been deprecated since Android 15. Google now wants everybody to use LiteRT (formerly TensorFlow Lite):
                Nobody is running Vulkan on NPUs. It's so you can run models on the GPU. This is why pytorch has supported vulkan for model inference for a quite a while.

                In case you didn't notice, far more phones have GPUs that support vulkan than properly support nnapi. And now nnapi is dead and google is trying to push some other braindead thing. Vulkan is the universal constant. You don't need google or android support to leverage it. You can just submit work to it and go.

                Comment


                • #9
                  Originally posted by Developer12 View Post
                  Nobody is running Vulkan on NPUs. It's so you can run models on the GPU.
                  Yes, we're agreed; Vulkan on NPUs isn't a "thing".

                  Originally posted by Developer12 View Post
                  ​This is why pytorch has supported vulkan for model inference for a quite a while.

                  In case you didn't notice, far more phones have GPUs that support vulkan than properly support nnapi.
                  This bundles a lot of assumptions. You're assuming PyTorch's Vulkan backend is as functional as its other backends, that most phone/tablet GPUs even support the necessary Vulkan extensions to run it, and that there are even any phone SoCs with NPUs that don't support NNAPI.

                  Originally posted by Developer12 View Post
                  ​​And now nnapi is dead and google is trying to push some other braindead thing. Vulkan is the universal constant. You don't need google or android support to leverage it. You can just submit work to it and go.
                  Vulkan support is not a binary thing. It's a complete mess of extensions and versions, so your device would need the right version of Vulkan plus all of the right extensions needed by the AI framework & model in question. It shouldn't surprise you to hear that support for newer Vulkan versions and many of the extensions is quite spotty, among Android devices.

                  Comment


                  • #10
                    Originally posted by coder View Post
                    Yes, we're agreed; Vulkan on NPUs isn't a "thing".


                    This bundles a lot of assumptions. You're assuming PyTorch's Vulkan backend is as functional as its other backends, that most phone/tablet GPUs even support the necessary Vulkan extensions to run it, and that there are even any phone SoCs with NPUs that don't support NNAPI.


                    Vulkan support is not a binary thing. It's a complete mess of extensions and versions, so your device would need the right version of Vulkan plus all of the right extensions needed by the AI framework & model in question. It shouldn't surprise you to hear that support for newer Vulkan versions and many of the extensions is quite spotty, among Android devices.
                    The open source drivers used by android support the necessary extensions, eg. Turnip. No exotic or unusual extensions are required. This backend is intended to be widely supported. That's the whole point of choosing vulkan in the first place, over something like OpenCL or SyCL or any of a rainbow of weakly-supported NPU APIs.

                    As for the ops supported by pytorch when running on vulkan, it seems churn in the repo has broken the hardcoded links to the op partitioning code file and the list of operators at https://pytorch.org/tutorials/protot..._workflow.html seems to be stale, so here's a tutorial on getting the LLaMA 3.2-1B LLM running on the vulkan backend: https://pytorch.org/executorch/stabl...un-vulkan.html There are a few ops not supported in vulkan yet (and tbh, even the cuda implementation in llama.cpp still encodes prompts on the CPU), but it supports running all of the important stuff on the GPU including the binary attention layers, convolution layers, and linear layers. In other words, even for the most cutting edge models they've gone to the trouble of implementing important stuff like all the special kinds of attention layer for an LLM.

                    Go do some actual research before you spout worthless what-ifs.

                    Comment

                    Working...
                    X