Announcement

**brent** · 15 July 2012, 06:55 PM

Anyway, the TL;DR version: I'm not particularly familiar with Radeon GPU programming, but I'd really like to improve the power management situation, and I'd appreciate some pointers on where to start looking. Or better yet, some bits and pieces of documentation.

**bridgman** · 15 July 2012, 08:15 PM

Originally posted by brent View Post

About clock gating: Well, I don't really see what's so very hard about that. There must be some internal documentation, or at least code in fglrx. Power management is a pretty isolated feature set, and probably doesn't amount to a lot of code, either.

Power management runs into ever corner of the chip and touches every driver component if you want to get all possible power savings. The power management code alone in the proprietary driver is bigger than the entire open source graphics driver stack, although it's getting smaller for the last couple of GPU generations.

Originally posted by brent View Post

Just have someone sit down and at least give some basic pointers about how it is supposed to work, please? I browsed through the RV630 register documentation and family 14h BKDG, but there's no real pointer on where to start with GPU clock gating. I am tapping in the dark.

Clock gating used to be fairly simple and independent of the other power management activities, and so it was supported in the open source drivers fairly early in the project. That is no longer the case -- as far as we can see clock gating pretty much requires everything else to be in place as well.

Originally posted by brent View Post

IMHO it's strange how everything related to CPU and NB is documented quite thoroughly, while the GPU is hardly documented at all.

Not really... CPUs don't generally have "drivers" in the traditional sense -- most of the standard API is the instruction set and everything else is hard-coded in OS and BIOS by third parties, so excruciatingly detailed documentation is required. For GPUs, the standard APIs are much higher level at the top of a large driver stack -- OpenGL, DirectX, OpenCL etc... -- so the focus is on writing and supplying drivers rather than writing documentation for third party developers.

It's not quite that black-and-white these days, since (a) HW vendors are helping more with the OS code and (b) community-developed open source graphics drivers are aiming a lot higher in functionality and feature set, but there's still a huge difference in "where the API is defined".

**russofris** · 15 July 2012, 09:32 PM

JB,

When you say you're trying to figure out how the hardware works, what exactly do you mean?

A: Joe Engineer designed that part of the hardware, but Engineer was reallocated to another team and is not available for consultation?
B: The design departments are not forthcoming with their documentation?
C: The documentation never existed, or does not exist in a form that can be distributed (Documentation exists as comments in a SPICE model for example)?
D: All of the above, and some others, and the full story would make me cry?

F

**bridgman** · 15 July 2012, 10:42 PM

More like :

A: Joe Engineer designed that part of the hardware but moved to another department 18 months ago and is on a month-long sabbatical... when he gets back he is available for consultation but only remembers the stuff you already know and not much else...

B: The design departments are forthcoming with the documentation, but the documentation is oriented towards designing hardware, not designing software... remember the software was designed three years ago in conjunction with the hardware team and both teams are now working on hardware 3-4 generations past Evergreen...

C: The internal documentation is not written to be distributed; it's intended to make sure the hardware design is complete, robust, testable and implementable, but it's nothing at all like what driver developers would want. The hardware documentation is all about how the hardware internals are going to work, and the software documentation is all about how the Catalyst internals are going to work.

D: Between the documentation, picking Joe Engineer's brains, picking through diagnostics / BIOS / driver code it still doesn't work the way you expect... again, this is why getting open source driver development aligned with hardware and proprietary driver development is such a priority.

That said, I think the "figuring out" part is mostly in the past and now we're at the "figuring out if we can release it" stage.

**brent** · 15 July 2012, 11:04 PM

And now for something completely different: I'm still trying to figure out why 2D performance is so bad. Even with the power state switching fixed, 2D performance is noticeably worse than on an Atom netbook with GMA 950 - a GPU that is much slower than the Radeon 6320 and also much less capable. I don't have an Atom netbook available at the moment, unfortunately. However, x11perf -scroll500 just achieves ~500 iterations per second, -comppixwin500 does ~1300 per second. A PC with an older NVidia GPU is over an order of magnitude faster for both of these. This doesn't seem right, and it's not like this is only visible in benchmarks. Smooth scrolling in Firefox is sluggish/not smooth, and scrolling in gnome-terminal is very slow and laggy. Even worse, with acceleration disabled (Option "NoAccel" "true") everything is much faster, both in benchmarks and in practice.

Is this the kind of performance to expect from the drivers or is there something funny going on with my system? What's the best way to debug this?

**bridgman** · 15 July 2012, 11:33 PM

Quick answer without investigation is that older GPUs had dedicated 2D hardware optimized for lines, circles, text drawing and scrolling. Modern GPUs don't have that (r5xx / rs6xx were the last ATI/AMD parts with 2D hardware, don't know for NVidia but probably the same timeframe, not sure about Intel). 3D engines could do most of the operations as fast as or faster than 2D hardware, and spending die area on the 3D engine aligned much better with what customers were looking for.

Unfortunately scrolling (where source and destination rectangles overlap) turns out to be one of the harder things to do with the 3D engine. There was a lot of discussion during the initial driver support for r6xx and higher, finding the right balance between "go fast" and "no corruption". The technical issue is that blits are done with a 3D engine by using a texture for the source rectangle and a render target for the destination rectangle. Each has its own cache, so writes to the destination rectangle don't update the cached copy of the source rectangle which you want when doing a blit. Even worse, the render operation is scattered across multiple SIMDs and there is no guarantee that the copy operations for one scanline will be completed before the operations for the next one begin. The hardware guarantees correctly ordered writes between triangles/quads (IIRC) but not within a single primitive.

IIRC the algorithm of choice ended up breaking the rectangles into narrow horizontal slices the size of the vertical scroll distance, blitting rectangles one at a time, and flushing caches between rectangles. That was reliable and reasonably fast; not sure if any improvement since then has been figured out.

Note that the above information could be 3 years out of date.

**TobiSGD** · 15 July 2012, 11:46 PM

How is the CPU involved into that? I can easily up the processor usage on both cores on my laptop to 60% with just opening a terminal and pressing down the Enter-key. According to your description this shouldn't happen, or am I wrong?

**bridgman** · 16 July 2012, 01:15 AM

I don't think my description talked about CPU load, did it ?

IIRC all that rectangle-blitting and cache-flushing (and waiting for cache flushing) did eat up some CPU time, but not sure if 60% is reasonable.

**kobblestown** · 16 July 2012, 03:26 AM

Originally posted by bridgman View Post

Yeah, but if that's the case I wish people would actually say that. I might even agree

It is to be commended that AMD has some OSS policy and are releasing documentation that helps in this regard. Yes, it's a pity that the OSS Radeon driver is so far behind in therms of performance and features. And those are both mostly due to the incompleteness of the documentation. And it is amazing what the Nouveau people can achieve without any documentation at all! But I'm sticking with AMD, precisely because of their Linux policy. It might not bear fruit quickly but it's something. As a mostly Linux user I try to buy hardware that has good Linux support. And when I buy graphics solutions I buy AMD even if it less powerfull - the AMD attitude towrads OSS makes up for it. The reason I'm writing this is that said OSS policy doesn't seem to be enforced too vigorously. That is of course just an outside perspective, but it really doesn't look like it's important for AMD. Again that's just how it looks. But it should be. Because there are people out there that make purchasing decision based on that. I know they aren't too many but I'd like to believe they are getting more numerous. So bridgman, please tell this to your superiors. Your efforts are not in vain. If only things could happen a bit quicker...

**brent** · 16 July 2012, 05:04 AM

Originally posted by bridgman View Post

Quick answer without investigation is that older GPUs had dedicated 2D hardware optimized for lines, circles, text drawing and scrolling. Modern GPUs don't have that (r5xx / rs6xx were the last ATI/AMD parts with 2D hardware, don't know for NVidia but probably the same timeframe, not sure about Intel). 3D engines could do most of the operations as fast as or faster than 2D hardware, and spending die area on the 3D engine aligned much better with what customers were looking for.

Yes, I am aware of that, but on Intel and NVidia 2D performance through 3D hardware does not have performance issues.

Unfortunately scrolling (where source and destination rectangles overlap) turns out to be one of the harder things to do with the 3D engine. There was a lot of discussion during the initial driver support for r6xx and higher, finding the right balance between "go fast" and "no corruption". The technical issue is that blits are done with a 3D engine by using a texture for the source rectangle and a render target for the destination rectangle. Each has its own cache, so writes to the destination rectangle don't update the cached copy of the source rectangle which you want when doing a blit. Even worse, the render operation is scattered across multiple SIMDs and there is no guarantee that the copy operations for one scanline will be completed before the operations for the next one begin. The hardware guarantees correctly ordered writes between triangles/quads (IIRC) but not within a single primitive.

Interesting, I'll take a look at the EXA code.

Anyway, I guess what baffles me the most is that most people have such low expectations that they are fine with drivers that perform so bad... and even go so far to say that 2D performs great! This is my first AMD GPU in years, and I only got it because people affirmed me that the open source drivers perform well now.

Announcement

E-450 graphics performance issues

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment