Announcement

Collapse
No announcement yet.

AMD's Ninth Iteration Of Their XDNA Linux Driver Posted For Ryzen AI

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD's Ninth Iteration Of Their XDNA Linux Driver Posted For Ryzen AI

    Phoronix: AMD's Ninth Iteration Of Their XDNA Linux Driver Posted For Ryzen AI

    Yesterday brought the eighth and ninth iteration of the AMD XDNA Linux kernel driver posted for review for enabling the Ryzen AI branded NPUs found in their recent SoCs...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Instead of terrible xdna, amd should use cdna without fp64 to fulfill Microsoft npu requirement

    Comment


    • #3
      Originally posted by zamroni111 View Post
      Instead of terrible xdna, amd should use cdna without fp64 to fulfill Microsoft npu requirement
      Even if you cut back on the fp32 processing hardware, that would use more die area and power for a given performance level than XDNA.

      GPUs are overdesigned for AI. Graphics has lots of random access and global data movement, which leads to unpredictable latencies. GPUs lean heavily on SMT as a way to deal with this, which has substantial costs mainly in terms of giant register files and the datapaths needed to use them.

      However, the data movement involved in AI inferencing and training is much more regular. At a fine-grain, it's much more predictable. That's why virtually everyone is using VLIW DSP cores working out of local SRAM. To sort out the coarse-grain synchronization and data movement, they rely on DMA engines.

      TBH, I don't really care so much about the specific architecture of the NPUs. What bugs me is that they're not directly programmable via some standard API, like OpenCL. I wish something like Rusticle could be ported to them, but I think that would probably be a massive lift, since it wouldn't even have the benefit of Mesa.
      Last edited by coder; 12 November 2024, 12:45 PM.

      Comment


      • #4
        Originally posted by coder View Post
        Even if you cut back on the fp32 processing hardware, that would use more die area and power for a given performance level than XDNA.

        GPUs are overdesigned for AI. Graphics has lots of random access and global data movement, which leads to unpredictable latencies. GPUs lean heavily on SMT as a way to deal with this, which has substantial costs mainly in terms of giant register files and the datapaths needed to use them.

        However, the data movement involved in AI inferencing and training is much more regular. At a fine-grain, it's much more predictable. That's why virtually everyone is using VLIW DSP cores working out of local SRAM. To sort out the coarse-grain synchronization and data movement, they rely on DMA engines.

        TBH, I don't really care so much about the specific architecture of the NPUs. What bugs me is that they're not directly programmable via some standard API, like OpenCL. I wish something like Rusticle could be ported to them, but I think that would probably be a massive lift, since it wouldn't even have the benefit of Mesa.
        XDNA is in turn 100% useless without wide API support, HIP, CL and Vulkan at a minimum... it doesn't matter if it can't "render" it should be able to do basic compute kernels. The same goes for thier compute compute cards.

        Comment


        • #5
          Originally posted by cb88 View Post
          XDNA is in turn 100% useless without wide API support, HIP, CL and Vulkan at a minimum...
          That would seem to imply you consider AI useless. AMD provides support for it on Windows, where it's already accelerating AI inferencing workloads today.

          With support on Linux, we can begin to do the same. Imagine using it to accelerate AI-based in-painting (and other filters) in GIMP, for instance.

          Originally posted by cb88 View Post
          ​it doesn't matter if it can't "render" it should be able to do basic compute kernels. The same goes for thier compute compute cards.
          I agree that it should support some GPU compute frameworks, even if it's not quite as well-suited to that type of workflow. That's certainly what I'd like to see.

          If we could run general-purpose compute workloads on it, I'd sure get a kick out of running some compute-based renderer! Imagine hacking a NPU to use it for generating interactive graphics!
          🤣

          Comment

          Working...
          X