Pitoiset has been involved with the Nouveau driver project for a while and this summer via Google Summer of Code he's been making excellent progress in exposing NVIDIA performance counters to user-space with the Nouveau driver stack. In Samuel's latest post, he shares that he's implemented compute-only MP counters for NV50 class hardware in Nouveau.
As Samuel explains, "MP" counters are local and per-context, which is better off than global counters. While implemented are "compute" MP counters, it is an arbitrary limitation of NVIDIA and these compute counters can be used for OpenGL games/applications too. For the prototype implementation of the NV50 MP counters, Samuel implemented an interface between the Linux kernel and Mesa to expose the counters to the users through Gallium3D's HUD.
The prototype implementation exposes 13 performance counters for NV50/Tesla hardware. The code is currently out-of-tree for Mesa and the Nouveau DRM but hopefully the work will be mainlined after the summer GSoC project has been completed.
More details via Samuel's blog.