Arm China Looking At Upstreaming Their "Zhouyi" NPU Driver Into The Linux Kernel
Arm China is looking at upstreaming their "Zhouyi" NPU driver into the Linux kernel via the recently-created accelerator "accel" subsystem. The Arm China Neural Processing Unit (NPU) driver in its current form has both an open-source kernel and user-space stack.
Three years ago a Baidu engineer was looking at a Zhouyi AI accelerator driver for the Linux kernel but was just an open-source kernel driver and at the time lacked an open-source user-space software stack. Since then Arm China has been working on this fully open user/kernel driver stack. The Arm China NPU is found in some SoCs like the Allwinner R329.
An Arm China engineer announced hopes today for upstreaming their NPU driver into the accelerator subsystem. For now at least the open user-mode driver and kernel driver are developed via Arm-China's Compass NPU driver on GitHub.
This open-source AI Accelerator driver stack relies on transforming TensorFlow Lite and ONNX models into an executable library via ArmChina's NN graph compiler. The application then loads that executable into the user-mode driver, the binary is submitted to the kernel driver, and the kernel driver then dispatches to the NPU hardware on supported SoCs.
There are some developer boards available via AliExpress and similar that have the Arm China NPU support present.
So far the mailing list thread is just laying out the interest and figuring out the requirements for getting this Arm China NPU driver potentially upstreamed into the mainline Linux kernel. It will still be a matter of upstream DRM/accel maintainers reviewing the code, ensuring it fits within the appropriate interfaces, etc, so there's still a long road ahead but at least nice to see Arm China now having a fully open-source NPU driver stack for Linux.
Three years ago a Baidu engineer was looking at a Zhouyi AI accelerator driver for the Linux kernel but was just an open-source kernel driver and at the time lacked an open-source user-space software stack. Since then Arm China has been working on this fully open user/kernel driver stack. The Arm China NPU is found in some SoCs like the Allwinner R329.
An Arm China engineer announced hopes today for upstreaming their NPU driver into the accelerator subsystem. For now at least the open user-mode driver and kernel driver are developed via Arm-China's Compass NPU driver on GitHub.
This open-source AI Accelerator driver stack relies on transforming TensorFlow Lite and ONNX models into an executable library via ArmChina's NN graph compiler. The application then loads that executable into the user-mode driver, the binary is submitted to the kernel driver, and the kernel driver then dispatches to the NPU hardware on supported SoCs.
There are some developer boards available via AliExpress and similar that have the Arm China NPU support present.
So far the mailing list thread is just laying out the interest and figuring out the requirements for getting this Arm China NPU driver potentially upstreamed into the mainline Linux kernel. It will still be a matter of upstream DRM/accel maintainers reviewing the code, ensuring it fits within the appropriate interfaces, etc, so there's still a long road ahead but at least nice to see Arm China now having a fully open-source NPU driver stack for Linux.
14 Comments