Announcement

Collapse
No announcement yet.

Performance Counters Stuck For Linux GPU Drivers

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Performance Counters Stuck For Linux GPU Drivers

    Phoronix: Performance Counters Stuck For Linux GPU Drivers

    While it mostly concerns developers, another current shortcoming of the open-source Linux graphics drivers is the lack of suitable performance counters support...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    We've actually been meaning to support GL_AMD_performance_monitor for some time now. I started on it a while back, but got busy with other things and never finished it.

    We do have a few things: intel-gpu-top shows how busy the GPU is, a breakdown by unit, render vs. blitter usage, and basic counters like VS/PS invocations. However, the 'top' style interface is not always the most useful, since it doesn't allow you to capture data over time. We also have a small proof of concept program called 'chaps' in intel-gpu-tools which exposes Ironlake's MI_REPORT_PERF counters. Ultimately, the idea is to expose those via GL_AMD_performance_monitor. We currently don't have that for Sandybridge/Ivybridge though, sadly. As Michael mentioned, there are some hoops to jump through, but I think once someone writes the code that should be doable.

    Another new tool is Eric's INTEL_DEBUG=shader_time, which shows a breakdown of how many clock cycles were used by each vertex shader, 8-wide fragment shader, and 16-wide fragment shader. It's extremely useful for determining which shaders are the most expensive (sometimes large shaders are seldom used, while smaller shaders are used extremely frequently, so guessing doesn't always work). That allows us to focus our optimization efforts. (Sadly this is Ivybridge only, since the timestamp register didn't exist prior to that.)

    But overall I agree, we need more performance counters, and need to expose them to application developers.
    Free Software Developer .:. Mesa and Xorg
    Opinions expressed in these forum posts are my own.

    Comment


    • #3
      I'd like some Radeon performance counters. Pretty please?

      Comment


      • #4
        Originally posted by Kayden View Post
        We've actually been meaning to support GL_AMD_performance_monitor for some time now. I started on it a while back, but got busy with other things and never finished it.

        We do have a few things: intel-gpu-top shows how busy the GPU is, a breakdown by unit, render vs. blitter usage, and basic counters like VS/PS invocations. However, the 'top' style interface is not always the most useful, since it doesn't allow you to capture data over time. We also have a small proof of concept program called 'chaps' in intel-gpu-tools which exposes Ironlake's MI_REPORT_PERF counters. Ultimately, the idea is to expose those via GL_AMD_performance_monitor. We currently don't have that for Sandybridge/Ivybridge though, sadly. As Michael mentioned, there are some hoops to jump through, but I think once someone writes the code that should be doable.

        Another new tool is Eric's INTEL_DEBUG=shader_time, which shows a breakdown of how many clock cycles were used by each vertex shader, 8-wide fragment shader, and 16-wide fragment shader. It's extremely useful for determining which shaders are the most expensive (sometimes large shaders are seldom used, while smaller shaders are used extremely frequently, so guessing doesn't always work). That allows us to focus our optimization efforts. (Sadly this is Ivybridge only, since the timestamp register didn't exist prior to that.)

        But overall I agree, we need more performance counters, and need to expose them to application developers.
        Yeah, I was pleased to see the intel-gpu-top and have been looking forward to doing the same on nouveau/nvidia. Anyway, it is going to be hard to actually find some hw-independant measures to expose to the game developers but things like shader execution time, memory bandwidth usage (in percent), PCIE bandwidth usage, shader engine usage and then the usual cache misses (this is going to be hard to make hw-independant and could be in the hw-dependant part).

        Comment


        • #5
          Does nv expose stall reasons? Those would be nice too if they exist. Ie, the fragment blocks not being able to run because the vertex shader has not completed for that whole block, and so on. (define block as thread group or whatever it's called on nv)

          Comment


          • #6
            Originally posted by M?P?F View Post
            Yeah, I was pleased to see the intel-gpu-top and have been looking forward to doing the same on nouveau/nvidia. Anyway, it is going to be hard to actually find some hw-independant measures to expose to the game developers but things like shader execution time, memory bandwidth usage (in percent), PCIE bandwidth usage, shader engine usage and then the usual cache misses (this is going to be hard to make hw-independant and could be in the hw-dependant part).
            That's one of the nice things about the GL_AMD_performance_monitor extension, though...it just exposes a generic counter mechanism. Applications can query GL to get a list of available counters (organized in groups), and then get data from them. It doesn't actually specify what counters are available, so you can expose whatever your hardware offers.
            Free Software Developer .:. Mesa and Xorg
            Opinions expressed in these forum posts are my own.

            Comment


            • #7
              Originally posted by Kayden View Post
              That's one of the nice things about the GL_AMD_performance_monitor extension, though...it just exposes a generic counter mechanism. Applications can query GL to get a list of available counters (organized in groups), and then get data from them. It doesn't actually specify what counters are available, so you can expose whatever your hardware offers.
              I'm not a big fan of exposing everything in a non-hw-independent way but this is better than nothing for sure! Exposing a more restricted subset of the performance counters but with some clearly-defined semantics should be of interest to game developers

              Comment

              Working...
              X