Announcement

Collapse
No announcement yet.

E-450 graphics performance issues

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    @bridgeman the push for increased GL version support probably should have been expected... considering that many open source developers have a implement first make it fast later mentality... and also that implementing is easier than optimizing. I think it might also be important for getting the driver up to feature parity so that driver development can occur in sync with hardware design.

    Comment


    • #62
      Yep, it's a sad fact that some 2d operations on the 3d engine suck. Add to that the likely better algorithms in the blobs.

      When R600 and GF8 were released, this was publicly known and anyone who needed good 2d just bought the previous gen (and this was on Windows too!). It's my understanding that even the latest gen loses in 2d to the cards with dedicated 2d, such as R500, GF7, or the old cards such as Matrox ones. (yep, tab switching on a Matrox is still faster than on a recent AMD, as is software (vesa) )


      Are you using a compositor anyhow? I tried to replicate your enter-pressing test, but all it did was bring X from 1% of one core to 3% of one core. But I'm not using a compositor, nor a bloated terminal such as Gnome terminal (mrxvt if you're curious, with antialiased fonts etc).

      Comment


      • #63
        Originally posted by curaga View Post
        Yep, it's a sad fact that some 2d operations on the 3d engine suck. Add to that the likely better algorithms in the blobs.

        When R600 and GF8 were released, this was publicly known and anyone who needed good 2d just bought the previous gen (and this was on Windows too!). It's my understanding that even the latest gen loses in 2d to the cards with dedicated 2d, such as R500, GF7, or the old cards such as Matrox ones. (yep, tab switching on a Matrox is still faster than on a recent AMD, as is software (vesa) )


        Are you using a compositor anyhow? I tried to replicate your enter-pressing test, but all it did was bring X from 1% of one core to 3% of one core. But I'm not using a compositor, nor a bloated terminal such as Gnome terminal (mrxvt if you're curious, with antialiased fonts etc).
        It's not really an issue with 2D vs. 3D engines. 2D engines suck for RENDER too. The reason vesa or old drivers seem faster for certain things is because they use shadowfb or XAA (which ends up being shadowfb because offscreen acceleration has been disabled for years due to bit rot in XAA). Shadowfb is pure software rendering. Pure CPU rendering is almost always faster than mixed CPU/GPU rendering since there is no ping-ponging between GPU and CPU rendering. You can enable shadowfb in the radeon driver if you want to compare by setting Option "NoAccel" "True" in the device section of your xorg config.

        Comment


        • #64
          In other words, too many fallbacks, and something like SNA for radeon should be done?

          Comment


          • #65
            Originally posted by curaga View Post
            In other words, too many fallbacks, and something like SNA for radeon should be done?
            or glamor.

            Comment


            • #66
              I've now done a fair share of profiling and tracing. Fallbacks aren't the main issue that's holding back 2D performance. Everything important is accelerated, generally there's no migration ping-pong, etc. It's just that acceleration is very slow, and defunct power management that is forcing the GPU clock to the lowest and slowest power state doesn't help.

              I'm not quite sure why rendering is so low as I'm not very familiar with the R600+ architecture. Part of it seems to be synchronous CS flushing, at least.

              Comment


              • #67
                What kind of functions are you benchmarking ? Some things like overlapping blits require frequent flushes to keep the texture and CB caches consistent in the overlap areas.

                Comment


                • #68
                  Yes, overlapped Copy operations are slow, and sometimes unnecessarily so*. However, it's also slow without many flushes, for instance when doing a lot of small Composite operations, a good example for this case is text rendering in gnome-terminal. I've experimented with increasing the size of the VBO and that seems to help a small bit, but not much. Generally, I use cairo-perf-trace to benchmark.

                  * When doing a copy inside a single pixmap, the DDX does a two-stage copy with two flushes even if the areas don't overlap. In this case no copy to the temporary is needed and one flush is enough. I've fixed that in my tree and it's noticeable faster in some cases, e.g. scrolling in gedit.

                  Comment


                  • #69
                    Originally posted by brent View Post
                    * The DDX does a two-stage copy blit with two flushes in a single pixmap even if the areas don't overlap. In this case no copy to the temporary is needed and one flush is enough. I've fixed that in my tree and it's noticeable faster in some cases, e.g. scrolling in gedit.
                    Yeah, that's right... IIRC you said that previously.

                    Comment


                    • #70
                      Not quite, I didn't notice that it is using a temporary even if that is not needed at the time!

                      Comment

                      Working...
                      X