Reverse-Engineered NPU Driver Tantalizingly Close To Proprietary Driver Performance
When it comes to neural processing unit NPU/AI accelerators for Linux there is open-source options with the likes most notably of Intel-owned Habana Labs leading the way, Intel's iVPU driver for the NPU found within Meteor Lake SoCs, AMD recently posting a Ryzen AI Linux driver, etc. When it comes to reverse-engineered efforts, the Etnaviv project has expanded its scopes from just Vivante graphics IP to also embracing the Vivante NPU IP for running workloads like TensorFlow Lite. With the latest open-source achievements, the Etnaviv NPU performance is coming incredibly close to the proprietary and official driver.
Tomeu Vizoso continues leading the charge on Etnaviv NPU open-source driver enablement. He's got the new Teflon framework merged for Mesa 24.1 for expanding Mesa's scope to NPUs. And after having all the initial bases covered has turned to further boosting the performance to make it competitive to the proprietary driver's performance.
In a new blog post he outlines making the open-source driver's performance even faster thanks to more convolutions and enabling image caching.
For comparing the progress on the Etnaviv driver and then finally the proprietary driver performance, Vizoso shared this graphic:
He also noted in his latest blog post:
Another nice achievement for Mesa and the broader open-source world.
Tomeu Vizoso continues leading the charge on Etnaviv NPU open-source driver enablement. He's got the new Teflon framework merged for Mesa 24.1 for expanding Mesa's scope to NPUs. And after having all the initial bases covered has turned to further boosting the performance to make it competitive to the proprietary driver's performance.
In a new blog post he outlines making the open-source driver's performance even faster thanks to more convolutions and enabling image caching.
For comparing the progress on the Etnaviv driver and then finally the proprietary driver performance, Vizoso shared this graphic:
He also noted in his latest blog post:
"At this point I am pretty confident that we can get quite close to the performance of the proprietary driver without much additional work, as a few major performance features remain to be implemented, and I know that I still need to give a pass at tuning some of the previous performance work.
But after getting the input tensor caching finished and before I move to any other improvements, I think I will invest some time in adding some profiling facilities so I can better direct the efforts and get the best returns."
Another nice achievement for Mesa and the broader open-source world.
3 Comments