Announcement

Collapse
No announcement yet.

2d tiling + sb -> no improvement in fill rate, curious

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    CPU governor had no change, dual export is enabled.

    Comment


    • #22
      Originally posted by curaga View Post
      Can't test swapbufferswait now (long downloads going), but that one is not really relevant, as tearing is unacceptable to me. It may turn out to be the wait causing the fillrate not to be up to hw specs, but since it has to be on, the question would then become "why didn't 2d tiling improve the fill rate".
      Looks like your results could be explained by SwapbuffersWait then, I enabled it to see how does it affect the fill results and got the following:

      SwapbuffersWait off, vblank_mode=0: Simple fill: 10.9 billion pixels/second
      SwapbuffersWait on, vblank_mode=0: Simple fill: 7.7 billion pixels/second

      As for the effect of 2d tiling, disabling it also results in a huge slowdown for me:
      SwapbuffersWait off, vblank_mode=0, ColorTiling2d off: Simple fill: 6.8 billion pixels/second

      So if 2d tiling doesn't make any difference for you, maybe it's not actually enabled with your gpu for some reason, or its effect is hidden by the SwapbuffersWait.

      Comment


      • #23
        Originally posted by curaga View Post
        Will check dual export and cpu governor. Can't test swapbufferswait now (long downloads going), but that one is not really relevant, as tearing is unacceptable to me. It may turn out to be the wait causing the fillrate not to be up to hw specs, but since it has to be on, the question would then become "why didn't 2d tiling improve the fill rate".
        SwapBuffersWait stalls the 3D engine to avoid tearing so you are basically leaving the GPU idle for long periods to avoid tearing.

        Comment


        • #24
          Tested with swapbufferswait off - no change (!).

          Simple fill: 1.3 billion pixels/second
          Blended fill: 1.1 billion pixels/second
          Textured fill: 1.2 billion pixels/second
          Shader1 fill: 1.3 billion pixels/second
          Shader2 fill: 516.6 million pixels/second

          $ grep -i swapb /var/log/Xorg.0.log
          [ 26040.404] (**) RADEON(0): Option "SwapbuffersWait" "off"
          [ 26040.407] (II) RADEON(0): SwapBuffers wait for vsync: disabled
          $ grep -i tilin /var/log/Xorg.0.log
          [ 26040.406] (II) RADEON(0): KMS Color Tiling: enabled
          [ 26040.406] (II) RADEON(0): KMS Color Tiling 2D: enabled
          $ echo $vblank_mode
          0

          Comment


          • #25
            Maybe something in the kernel changed since 3.7? I'm on airlieds drm-fixes branch (3.10.rcSomething)

            Comment


            • #26
              Code:
              Simple fill: 6.1 billion pixels/second
                Blended fill: 6.1 billion pixels/second
                Textured fill: 6.1 billion pixels/second
                Shader1 fill: 6.1 billion pixels/second
                Shader2 fill: 3.7 billion pixels/second
              My card (HD6670 1GB) is spec'd at 6.4Gpix, so it seems right for me.
              Last edited by Lemonzest; 29 May 2013, 02:36 PM.

              Comment


              • #27
                Dear Watson, we have a conclusion.

                On the bad side, it seems there is a constant overhead of 0.2-0.3 Gpix regardless of card position in the lineup and generation. This could be eliminated with driver advancements hopefully.
                The no-op detection could also use some love.

                On the good side, it turns out 1.3 is 81% not 55%. AMD you lying bitches, sure the units can push 2.3, but the VRAM can only push 1.6. Guess which number is mentioned in all marketing materials.

                Comment


                • #28
                  Originally posted by curaga View Post
                  Dear Watson, we have a conclusion.

                  On the bad side, it seems there is a constant overhead of 0.2-0.3 Gpix regardless of card position in the lineup and generation. This could be eliminated with driver advancements hopefully.
                  Actually I believe most of this overhead comes from the fact that SwapBuffers and Clear are called every 128 draw calls. With these calls I have 10.9 GP/s, without them 11.1 GP/s, which is even closer to 11.2 in the spec.

                  Originally posted by curaga View Post
                  The no-op detection could also use some love.
                  I sent the patch for sb to mesa-dev today that allows sb to get rid of all no-ops in shader2.

                  Originally posted by curaga View Post
                  On the good side, it turns out 1.3 is 81% not 55%. AMD you lying bitches, sure the units can push 2.3, but the VRAM can only push 1.6. Guess which number is mentioned in all marketing materials.
                  Probably low VRAM bandwidth can limit the fill rate in your case, but the parameter in question is peak pixel fill rate, not minimal, and probably it's possible to achieve 2.3 with your gpu in some circumstances depending on the buffer format and other factors, just not in this case.

                  Comment


                  • #29
                    No, the peak fillrate cannot be faster than what the memory can transfer.

                    Unless there is some live lossless compression, which I doubt there is.

                    Comment


                    • #30
                      Originally posted by curaga View Post
                      No, the peak fillrate cannot be faster than what the memory can transfer.
                      Unless there is some live lossless compression, which I doubt there is.
                      Fill rate is measured in pixels per second, memory bandwidth in bytes per second, and I think the number of bytes that should be transferred for each pixel depends on the buffer format and hardware configuration (that depends on GL state etc), probably it's also affected by possible optimizations like DUAL_EXPORT mode mentioned earlier in this thread. Basically, in some modes the hardware may have to transfer less data per pixel, thus allowing to fill more pixels using the same bandwidth.

                      Comment

                      Working...
                      X