Performance Counters Stuck For Linux GPU Drivers
Phoronix: Performance Counters Stuck For Linux GPU Drivers
While it mostly concerns developers, another current shortcoming of the open-source Linux graphics drivers is the lack of suitable performance counters support...
We've actually been meaning to support GL_AMD_performance_monitor for some time now. I started on it a while back, but got busy with other things and never finished it.
We do have a few things: intel-gpu-top shows how busy the GPU is, a breakdown by unit, render vs. blitter usage, and basic counters like VS/PS invocations. However, the 'top' style interface is not always the most useful, since it doesn't allow you to capture data over time. We also have a small proof of concept program called 'chaps' in intel-gpu-tools which exposes Ironlake's MI_REPORT_PERF counters. Ultimately, the idea is to expose those via GL_AMD_performance_monitor. We currently don't have that for Sandybridge/Ivybridge though, sadly. As Michael mentioned, there are some hoops to jump through, but I think once someone writes the code that should be doable.
Another new tool is Eric's INTEL_DEBUG=shader_time, which shows a breakdown of how many clock cycles were used by each vertex shader, 8-wide fragment shader, and 16-wide fragment shader. It's extremely useful for determining which shaders are the most expensive (sometimes large shaders are seldom used, while smaller shaders are used extremely frequently, so guessing doesn't always work). That allows us to focus our optimization efforts. (Sadly this is Ivybridge only, since the timestamp register didn't exist prior to that.)
But overall I agree, we need more performance counters, and need to expose them to application developers.
I'd like some Radeon performance counters. Pretty please?
Yeah, I was pleased to see the intel-gpu-top and have been looking forward to doing the same on nouveau/nvidia. Anyway, it is going to be hard to actually find some hw-independant measures to expose to the game developers but things like shader execution time, memory bandwidth usage (in percent), PCIE bandwidth usage, shader engine usage and then the usual cache misses (this is going to be hard to make hw-independant and could be in the hw-dependant part).
Originally Posted by Kayden
Does nv expose stall reasons? Those would be nice too if they exist. Ie, the fragment blocks not being able to run because the vertex shader has not completed for that whole block, and so on. (define block as thread group or whatever it's called on nv)
That's one of the nice things about the GL_AMD_performance_monitor extension, though...it just exposes a generic counter mechanism. Applications can query GL to get a list of available counters (organized in groups), and then get data from them. It doesn't actually specify what counters are available, so you can expose whatever your hardware offers.
Originally Posted by MùPùF