According to wikipedia, 5770's pixel pushing ability is 13.6, not 12.
Announcement
Collapse
No announcement yet.
2d tiling + sb -> no improvement in fill rate, curious
Collapse
X
-
Last edited by droste; 28 May 2013, 06:38 PM.
Comment
-
Originally posted by curaga View PostShader2 consists of shader1 + many no-ops that should be optimized out.
Originally posted by curaga View PostBy printing the results with R600_DEBUG=sb,sbstat,ps I could see both shaders were optimized to the exact same instructions.
As for the other results, did you turn vsync off (vblank_mode=0)? By default I have 7.6 with simple fill on my HD5750, without vsync - 10.9, which is pretty close to 11.2 in the card specs on amd.com.
Comment
-
Originally posted by droste View PostNo the shader is compiled before the measurement starts
glUseProgram() is called before PerfMeasureRate() is called.
/edit:
and PerfShaderProgram() too, which calls the compile and link.
Comment
-
Ah yes the vblanking... now it looks way better:
Code:Simple fill: 13.4 billion pixels/second Blended fill: 13.4 billion pixels/second Textured fill: 13.4 billion pixels/second Shader1 fill: 13.4 billion pixels/second Shader2 fill: 6.0 billion pixels/second
But now the Shader2 test is way slower in comparison to the Shader1 test ;-)
Comment
-
Originally posted by vadimg View PostThere are also some additional shaders in the dump (for blits etc) aside from the shaders explicitly requested by the app, so it's not always easy to say what bytecode in the dump belongs to what app's shader, possibly you looked at the wrong shaders. As far as I can see, bytecode for shader2 is in fact still longer even after optimizations.
As for the other results, did you turn vsync off (vblank_mode=0)? By default I have 7.6 with simple fill on my HD5750, without vsync - 10.9, which is pretty close to 11.2 in the card specs on amd.com.
Comment
-
Originally posted by droste View PostBut now the Shader2 test is way slower in comparison to the Shader1 test ;-)
Originally posted by curaga View PostThe dump only included two shaders that did texture fetches; these must therefore be the two shaders of the app.
Code:R600_DEBUG=sb,nollvm,ps ./fill 2>&1 | grep SAMPLE
Originally posted by curaga View PostAll my measurements were done with vblank_mode=0.
If you still have low fill rate with all these options, you might want to check if the DUAL_EXPORT mode is actually enabled in the driver for your GPU during the tests (see these commits), IIRC fill rate was close to the specs for me since then.
Comment
-
I don't use a compositor, and both 1d and 2d tiling default on (and are on according to xorg.0.log). See the first post for the gpu power profile.
I don't use debug builds, the asserts and debug paths usually are not worth it (if I need to debug, I add -g to my cflags).
Will check dual export and cpu governor. Can't test swapbufferswait now (long downloads going), but that one is not really relevant, as tearing is unacceptable to me. It may turn out to be the wait causing the fillrate not to be up to hw specs, but since it has to be on, the question would then become "why didn't 2d tiling improve the fill rate".
Hmm, for me 'grep SAMPLE' gives 4 occurences in the full dump for fill (or 8 with sb because each shader is dumped twice):
Comment
Comment