Same goes with filesystem, kernel expose high level operation (create node, delete node, node list ...) Or network with things like TCP stack.
But when it cames to GPU the common denominator is somethings like OpenGL or VAAPI if you consider video. Those are very _high level_ API (take a look at mesa source size). There is absolutely no middle ground in GPU, each GPU have different way to execute command, different command, different memory layout, different restrictions, different ways to tile things, different way to manage memory ...
So solutions are either GPU specific API and no way to foresee how good each API will be before having full userspace stack and starting to optimize performances. Or find some middle ground in the GL API (things like GLES API which represent the most used and most reasonable GL features) and move such things to kernel, again no garanty that this API would be any good until you build a full userspace stack.
But middle ground in GL API means moving a lot of code to the kernel and there again it's might not be trivial. For instance state book keeping needs a lot of structure and memory, but at the same time you want that memory you allocate on a behalf of a process to be accounted against this process and you also don't want the process to be able to mess with it while still allowing the kernel to work it (i don't think this is trivial in linux kernel but i have gap in my linux kernel memory code knowledge).
And when it comes to memory management, in GPU side things are way more dynamic the for audio device or network device, and also with way more subtilities (memory tiling, unmappable memory, GART, ...).