Announcement

**curaga** · 29 May 2013, 08:28 AM

CPU governor had no change, dual export is enabled.

**vadimg** · 29 May 2013, 08:59 AM

Originally posted by curaga View Post

Can't test swapbufferswait now (long downloads going), but that one is not really relevant, as tearing is unacceptable to me. It may turn out to be the wait causing the fillrate not to be up to hw specs, but since it has to be on, the question would then become "why didn't 2d tiling improve the fill rate".

Looks like your results could be explained by SwapbuffersWait then, I enabled it to see how does it affect the fill results and got the following:

SwapbuffersWait off, vblank_mode=0: Simple fill: 10.9 billion pixels/second
SwapbuffersWait on, vblank_mode=0: Simple fill: 7.7 billion pixels/second

As for the effect of 2d tiling, disabling it also results in a huge slowdown for me:
SwapbuffersWait off, vblank_mode=0, ColorTiling2d off: Simple fill: 6.8 billion pixels/second

So if 2d tiling doesn't make any difference for you, maybe it's not actually enabled with your gpu for some reason, or its effect is hidden by the SwapbuffersWait.

**agd5f** · 29 May 2013, 09:44 AM

Originally posted by curaga View Post

Will check dual export and cpu governor. Can't test swapbufferswait now (long downloads going), but that one is not really relevant, as tearing is unacceptable to me. It may turn out to be the wait causing the fillrate not to be up to hw specs, but since it has to be on, the question would then become "why didn't 2d tiling improve the fill rate".

SwapBuffersWait stalls the 3D engine to avoid tearing so you are basically leaving the GPU idle for long periods to avoid tearing.

**curaga** · 29 May 2013, 11:56 AM

Tested with swapbufferswait off - no change (!).

Simple fill: 1.3 billion pixels/second
Blended fill: 1.1 billion pixels/second
Textured fill: 1.2 billion pixels/second
Shader1 fill: 1.3 billion pixels/second
Shader2 fill: 516.6 million pixels/second

$ grep -i swapb /var/log/Xorg.0.log
[ 26040.404] (**) RADEON(0): Option "SwapbuffersWait" "off"
[ 26040.407] (II) RADEON(0): SwapBuffers wait for vsync: disabled
$ grep -i tilin /var/log/Xorg.0.log
[ 26040.406] (II) RADEON(0): KMS Color Tiling: enabled
[ 26040.406] (II) RADEON(0): KMS Color Tiling 2D: enabled
$ echo $vblank_mode
0

**droste** · 29 May 2013, 01:12 PM

Maybe something in the kernel changed since 3.7? I'm on airlieds drm-fixes branch (3.10.rcSomething)

**Lemonzest** · 29 May 2013, 02:31 PM

Code:

Simple fill: 6.1 billion pixels/second
  Blended fill: 6.1 billion pixels/second
  Textured fill: 6.1 billion pixels/second
  Shader1 fill: 6.1 billion pixels/second
  Shader2 fill: 3.7 billion pixels/second

My card (HD6670 1GB) is spec'd at 6.4Gpix, so it seems right for me.

**curaga** · 31 May 2013, 06:40 AM

Dear Watson, we have a conclusion.

On the bad side, it seems there is a constant overhead of 0.2-0.3 Gpix regardless of card position in the lineup and generation. This could be eliminated with driver advancements hopefully.
The no-op detection could also use some love.

On the good side, it turns out 1.3 is 81% not 55%. AMD you lying bitches, sure the units can push 2.3, but the VRAM can only push 1.6. Guess which number is mentioned in all marketing materials.

**vadimg** · 31 May 2013, 08:40 AM

Originally posted by curaga View Post

Dear Watson, we have a conclusion.

On the bad side, it seems there is a constant overhead of 0.2-0.3 Gpix regardless of card position in the lineup and generation. This could be eliminated with driver advancements hopefully.

Actually I believe most of this overhead comes from the fact that SwapBuffers and Clear are called every 128 draw calls. With these calls I have 10.9 GP/s, without them 11.1 GP/s, which is even closer to 11.2 in the spec.

Originally posted by curaga View Post

The no-op detection could also use some love.

I sent the patch for sb to mesa-dev today that allows sb to get rid of all no-ops in shader2.

Originally posted by curaga View Post

On the good side, it turns out 1.3 is 81% not 55%. AMD you lying bitches, sure the units can push 2.3, but the VRAM can only push 1.6. Guess which number is mentioned in all marketing materials.

Probably low VRAM bandwidth can limit the fill rate in your case, but the parameter in question is peak pixel fill rate, not minimal, and probably it's possible to achieve 2.3 with your gpu in some circumstances depending on the buffer format and other factors, just not in this case.

**curaga** · 31 May 2013, 10:37 AM

No, the peak fillrate cannot be faster than what the memory can transfer.

Unless there is some live lossless compression, which I doubt there is.

**vadimg** · 31 May 2013, 11:46 AM

Originally posted by curaga View Post

No, the peak fillrate cannot be faster than what the memory can transfer.
Unless there is some live lossless compression, which I doubt there is.

Fill rate is measured in pixels per second, memory bandwidth in bytes per second, and I think the number of bytes that should be transferred for each pixel depends on the buffer format and hardware configuration (that depends on GL state etc), probably it's also affected by possible optimizations like DUAL_EXPORT mode mentioned earlier in this thread. Basically, in some modes the hardware may have to transfer less data per pixel, thus allowing to fill more pixels using the same bandwidth.

Announcement

2d tiling + sb -> no improvement in fill rate, curious

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment