Anyway, the TL;DR version: I'm not particularly familiar with Radeon GPU programming, but I'd really like to improve the power management situation, and I'd appreciate some pointers on where to start looking. Or better yet, some bits and pieces of documentation.
Announcement
Collapse
No announcement yet.
E-450 graphics performance issues
Collapse
X
-
Originally posted by brent View PostAbout clock gating: Well, I don't really see what's so very hard about that. There must be some internal documentation, or at least code in fglrx. Power management is a pretty isolated feature set, and probably doesn't amount to a lot of code, either.
Originally posted by brent View PostJust have someone sit down and at least give some basic pointers about how it is supposed to work, please? I browsed through the RV630 register documentation and family 14h BKDG, but there's no real pointer on where to start with GPU clock gating. I am tapping in the dark.
Originally posted by brent View PostIMHO it's strange how everything related to CPU and NB is documented quite thoroughly, while the GPU is hardly documented at all.
It's not quite that black-and-white these days, since (a) HW vendors are helping more with the OS code and (b) community-developed open source graphics drivers are aiming a lot higher in functionality and feature set, but there's still a huge difference in "where the API is defined".Test signature
Comment
-
JB,
When you say you're trying to figure out how the hardware works, what exactly do you mean?
A: Joe Engineer designed that part of the hardware, but Engineer was reallocated to another team and is not available for consultation?
B: The design departments are not forthcoming with their documentation?
C: The documentation never existed, or does not exist in a form that can be distributed (Documentation exists as comments in a SPICE model for example)?
D: All of the above, and some others, and the full story would make me cry?
F
Comment
-
More like :
A: Joe Engineer designed that part of the hardware but moved to another department 18 months ago and is on a month-long sabbatical... when he gets back he is available for consultation but only remembers the stuff you already know and not much else...
B: The design departments are forthcoming with the documentation, but the documentation is oriented towards designing hardware, not designing software... remember the software was designed three years ago in conjunction with the hardware team and both teams are now working on hardware 3-4 generations past Evergreen...
C: The internal documentation is not written to be distributed; it's intended to make sure the hardware design is complete, robust, testable and implementable, but it's nothing at all like what driver developers would want. The hardware documentation is all about how the hardware internals are going to work, and the software documentation is all about how the Catalyst internals are going to work.
D: Between the documentation, picking Joe Engineer's brains, picking through diagnostics / BIOS / driver code it still doesn't work the way you expect... again, this is why getting open source driver development aligned with hardware and proprietary driver development is such a priority.
That said, I think the "figuring out" part is mostly in the past and now we're at the "figuring out if we can release it" stage.Last edited by bridgman; 15 July 2012, 10:50 PM.Test signature
Comment
-
And now for something completely different: I'm still trying to figure out why 2D performance is so bad. Even with the power state switching fixed, 2D performance is noticeably worse than on an Atom netbook with GMA 950 - a GPU that is much slower than the Radeon 6320 and also much less capable. I don't have an Atom netbook available at the moment, unfortunately. However, x11perf -scroll500 just achieves ~500 iterations per second, -comppixwin500 does ~1300 per second. A PC with an older NVidia GPU is over an order of magnitude faster for both of these. This doesn't seem right, and it's not like this is only visible in benchmarks. Smooth scrolling in Firefox is sluggish/not smooth, and scrolling in gnome-terminal is very slow and laggy. Even worse, with acceleration disabled (Option "NoAccel" "true") everything is much faster, both in benchmarks and in practice.
Is this the kind of performance to expect from the drivers or is there something funny going on with my system? What's the best way to debug this?
Comment
-
Quick answer without investigation is that older GPUs had dedicated 2D hardware optimized for lines, circles, text drawing and scrolling. Modern GPUs don't have that (r5xx / rs6xx were the last ATI/AMD parts with 2D hardware, don't know for NVidia but probably the same timeframe, not sure about Intel). 3D engines could do most of the operations as fast as or faster than 2D hardware, and spending die area on the 3D engine aligned much better with what customers were looking for.
Unfortunately scrolling (where source and destination rectangles overlap) turns out to be one of the harder things to do with the 3D engine. There was a lot of discussion during the initial driver support for r6xx and higher, finding the right balance between "go fast" and "no corruption". The technical issue is that blits are done with a 3D engine by using a texture for the source rectangle and a render target for the destination rectangle. Each has its own cache, so writes to the destination rectangle don't update the cached copy of the source rectangle which you want when doing a blit. Even worse, the render operation is scattered across multiple SIMDs and there is no guarantee that the copy operations for one scanline will be completed before the operations for the next one begin. The hardware guarantees correctly ordered writes between triangles/quads (IIRC) but not within a single primitive.
IIRC the algorithm of choice ended up breaking the rectangles into narrow horizontal slices the size of the vertical scroll distance, blitting rectangles one at a time, and flushing caches between rectangles. That was reliable and reasonably fast; not sure if any improvement since then has been figured out.
Note that the above information could be 3 years out of date.Test signature
Comment
-
Originally posted by bridgman View PostYeah, but if that's the case I wish people would actually say that. I might even agree
Comment
-
Originally posted by bridgman View PostQuick answer without investigation is that older GPUs had dedicated 2D hardware optimized for lines, circles, text drawing and scrolling. Modern GPUs don't have that (r5xx / rs6xx were the last ATI/AMD parts with 2D hardware, don't know for NVidia but probably the same timeframe, not sure about Intel). 3D engines could do most of the operations as fast as or faster than 2D hardware, and spending die area on the 3D engine aligned much better with what customers were looking for.
Unfortunately scrolling (where source and destination rectangles overlap) turns out to be one of the harder things to do with the 3D engine. There was a lot of discussion during the initial driver support for r6xx and higher, finding the right balance between "go fast" and "no corruption". The technical issue is that blits are done with a 3D engine by using a texture for the source rectangle and a render target for the destination rectangle. Each has its own cache, so writes to the destination rectangle don't update the cached copy of the source rectangle which you want when doing a blit. Even worse, the render operation is scattered across multiple SIMDs and there is no guarantee that the copy operations for one scanline will be completed before the operations for the next one begin. The hardware guarantees correctly ordered writes between triangles/quads (IIRC) but not within a single primitive.
Anyway, I guess what baffles me the most is that most people have such low expectations that they are fine with drivers that perform so bad... and even go so far to say that 2D performs great! This is my first AMD GPU in years, and I only got it because people affirmed me that the open source drivers perform well now.
Comment
Comment