Announcement

**ahlaht** · 04 June 2011, 05:04 PM

Is there any hope?

I'm waiting for AMD to release two or three or four performance patches (?like this?).

Edit: I've switched back to AMD and r600g-git. It works very well for me now, but not too fast 3d.

**pingufunkybeat** · 04 June 2011, 05:15 PM

Originally posted by ahlaht View Post

I'm waiting for AMD to release two or three or four performance patches like this.

Why? This is 2d acceleration, and AMD's EXA acceleration is superb.

**sabriah** · 04 June 2011, 05:52 PM

This is an example where the (boring party pooper) harmonic mean should be used.

Harmonic mean - Wikipedia

http://en.wikipedia.org/wiki/Harmonic_mean

The real improvement probably lies more at 1.5x, which in itself is quite fantastic, as it is a 50% improvement.

I guess Intel rather say 14x than 1400% which would have lowered their credibility in being sensationalists.

Just an idea

**patrik** · 05 June 2011, 07:01 AM

Originally posted by Kivada View Post

Huh? We're talking about Intel, a massive company with near unlimited resources and comparatively only a small handful of GPU designs to maintain can't handle the move to KMS, compared to AMD who where in a bad way financially till only just recently is handling the switch very well in their OSS drivers even though they have a much larger backlog of GPU designs to maintain.

Maybe it's just me, but something just doesn't add up.

Sure they could move faster, but that was not my point. Intel hardware is different from AMD's so TTM and EXA might not be a good solution for both of them.

Whenever I code something I try to find at least two ways of doing it and pick the best. If there is no "B" how do I know that "A" is the way to go?

**patrik** · 05 June 2011, 07:07 AM

Originally posted by pingufunkybeat View Post

Why? This is 2d acceleration, and AMD's EXA acceleration is superb.

From what I understand (without actually reading the patches) this makes much of the 2D acceleration go through the 3D pipeline. This improves both 2D and 3D since the hardware doesn't have to switch between them. AMD might benefit from a similar approach.

**bridgman** · 05 June 2011, 07:26 AM

Originally posted by patrik View Post

From what I understand (without actually reading the patches) this makes much of the 2D acceleration go through the 3D pipeline. This improves both 2D and 3D since the hardware doesn't have to switch between them. AMD might benefit from a similar approach.

We are already doing that on 6xx and higher. I would love to say that we are doing it because of genius and foresight, but it was mostly because there wasn't any 2D hardware we could use.

If I understand the rest of the description correctly they are also making more use of shadowfb (CPU rendering into system memory) for 2D operations. Our conclusion while implementing 6xx/7xx 2D (the first hardware without 2D acceleration) was that the combination of shadowfb for 2D plus hardware acceleration for 3D would probably work very well, at least for the X drawing API, but I don't think we had time to implement it and see if it really worked. It's not trivial to implement properly, however, since you need the buffer in system memory for fast CPU rendering but need it in video RAM for fast GPU rendering and there are always a few things that really benefit from GPU acceleration. It's definitely easier if you are only dealing with shared-memory devices and not GPUs with local fast VRAM but still not easy (easier != easy).

For what it's worth, I think everyone is reading too much into the "... Architecture" name. This does not appear to be a new interface between common code and driver code or any kind of "going their own way", just driver-internal changes in the implementation of existing APIs.

Someone is probably saying "see, I told you we shouldn't have given it a name..." right now

**patrik** · 05 June 2011, 08:44 AM

Originally posted by bridgman View Post

We are already doing that on 6xx and higher. I would love to say that we are doing it because of genius and foresight, but it was mostly because there wasn't any 2D hardware we could use.

The foresight was done by your hardware engineers

For what it's worth, I think everyone is reading too much into the "... Architecture" name. This does not appear to be a new interface between common code and driver code or any kind of "going their own way", just driver-internal changes in the implementation of existing APIs.

Agreed. And picking your next graphics card based on what API (EXA vs UXA, etc...) they use is just plain silly.

Thanks for the info

**allquixotic** · 05 June 2011, 12:46 PM

Originally posted by patrik View Post

Agreed. And picking your next graphics card based on what API (EXA vs UXA, etc...) they use is just plain silly.

This is true, but picking your next graphics card based on what percentage of its total usable performance is being used by the open source graphics stack is certainly not silly, especially if you're a free software / open source fanatic like me.

I've watched my Radeon HD5970 go from unusable, to slow and usable, back to unusable (regressions in mesa 7.11), and eventually it might go back to usable again -- but the only way to use more than, say, 10% of the total capability of the GPUs on it is to run Catalyst. This Intel patch was targeted specifically at using more of the hardware to get more performance. I haven't seen anything of the sort for AMD in a long, long time (6 months or so). Most of the work coming out of AMD is still just the initial hardware bring-up for a new ASIC. But I don't need to tell you that just this initial hardware bringup, while significant, is only step 1 on a list of 1000 steps to get the driver to professional quality.

My observation from following the commit logs of mesa is that Alex Deucher performs this step 1 quite reliably on new AMD ASICs, but he doesn't do very much in the way of enhancing performance or hardware utilization on existing ASICs. He gets 'em to the point where they can run compiz without constantly crashing, get basic EXA working, maybe basic shader support, then leave the rest to the community a la Dave Airlie and Marek Olsak. Where is the performance work coming out of AMD? Will the new open source driver developers -- once hired -- work on that? Or are we going to chase video acceleration up one side of the mountain and down the other until we even begin to look at bashing 3D apps through the 12 fps ceiling?

If anything, we should see twice as much performance work coming out of AMD than Intel, because AMD has a much steeper hill to climb: high-end AMD discrete GPUs have much more hardware capability available, so I'd imagine it would take a lot more effort to fight pipeline stalls and keep the GPU fed with command streams in order to achieve more than slideshow FPS in any sort of remotely complex scene. The 2D performance is fine, but there's no reason a HD5970 should perform the same (or slower!) with such basic programs as OpenArena, let alone the more complex apps with GLSL and FBOs and floating-point textures in a deferred rendering pipeline.

Hopefully the efforts to remove either Mesa IR or TGSI will reduce some of the CPU overhead and result in a broad-stroke optimization that helps get frames rendered faster. But what else can be done for AMD cards in particular, to use more of the hardware? I haven't even begun to think about multi-GPU rendering (which would be needed to support both GPUs of a HD5970 or HD6990) because that would give me only about a 25 - 35% performance increase after a single GPU is being more or less utilized to its potential. So while I'd like to see both GPUs in my dual-GPU card doing something, I don't think that will happen soon, nor will it really be that important for performance. Let's first get my card up to the point where a single GPU inside it can perform as fast as a HD5850 can perform using Catalyst.

**bridgman** · 05 June 2011, 03:23 PM

I'm guessing this is a rhetorical question and you know that (a) our open source graphics effort is aimed at supporting the driver development community, not doing all the driver development ourselves, and (b) a lot of the work Alex did in the last 6 months *was* performance-related, but just in case...

This Intel patch was targeted specifically at using more of the hardware to get more performance.

Strictly speaking it was targeted at using *less* of the hardware, but I understand what you are saying

I haven't seen anything of the sort for AMD in a long, long time (6 months or so).

6 months is a long long time ? You're kidding, right ?

My observation from following the commit logs of mesa is that Alex Deucher performs this step 1 quite reliably on new AMD ASICs, but he doesn't do very much in the way of enhancing performance or hardware utilization on existing ASICs. <snip> Where is the performance work coming out of AMD?

Mesa is the wrong place to be looking -- look in the -ati X driver, that's where the work you want is happening. Wasn't the Intel update in the X driver as well ?

Not sure what point you are trying to make here. Are you saying that Alex should be doing performance work *instead* of what he was doing, or do you think that in six months he should have enough time to do all the work he did accomplish...

- implement and debug support for Ontario (first Fusion part)
- implement and debug support for Barts/Turks/Caicos,
- implement and debug support for Cayman (significantly different 3D engine)
- implement and debug support for Llano (different display pipe)
- bug fixing and stability improvements on 9 generations of hardware
- performance work you might not have noticed (enabling tiling etc..)

... *and* rewrite the driver stack to deal with some of the major performance bottlenecks ?

Will the new open source driver developers -- once hired -- work on that?

They've been hired for a while, they just haven't started yet.

Once Richard's replacement starts I expect we will at least get back to the level of performance work that was being done before he changed teams. Will there be more ? Hard to say.

Note that improving performance doesn't seem to have much to do with "using more of the hardware" but rather optimizing the internal architecture of the driver stack to eliminate hardware operations that are not needed (eg updating less state information), and that is something we would do in conjunction with the community rather than in isolation anyways. A lot of the performance work that Alex did on 3xx-5xx last year was being worked on this year for 6xx and higher, but that may not be obvious in the commits (particularly if you are looking in mesa rather than in the DDX).

Or are we going to chase video acceleration up one side of the mountain and down the other until we even begin to look at bashing 3D apps through the 12 fps ceiling?

I have no idea what this means. Alex isn't working on video acceleration, is he ?

If you are asking "are we suddenly going to become stupid ?" I'm pretty sure the answer is "no".

high-end AMD discrete GPUs have much more hardware capability available, so I'd imagine it would take a lot more effort to fight pipeline stalls and keep the GPU fed with command streams in order to achieve more than slideshow FPS in any sort of remotely complex scene. The 2D performance is fine, but there's no reason a HD5970 should perform the same (or slower!) with such basic programs as OpenArena, let alone the more complex apps with GLSL and FBOs and floating-point textures in a deferred rendering pipeline.

If the generic driver stack is close to being CPU limited even on less powerful hardware that seems like a very good reason for faster hardware not showing the additional performance you expect. Not sure I understand what you are saying here.

But what else can be done for AMD cards in particular, to use more of the hardware?

I'm not sure why you keep referring to "using more of the hardware" - do you mean "using the hardware more efficiently" ? If so, then the recent work on things like tiling is the most important step in that direction.

Let's first get my card up to the point where a single GPU inside it can perform as fast as a HD5850 can perform using Catalyst.

No problemo, once someone comes up with a good business case for throwing hundreds (or at least dozens) of developers at the code. In the meantime, I think performance work is happening in the right order now :

- implement support for all the performance-related features - tiling, HyperZ, pageflipping etc.. - and bug fix to the point where they can be enabled by default, so the hardware will be running at full speed at least at a micro-level

- once the above work has been done, *then* start looking at driver architecture options to improve macro-level performance

Do you think something different should be done ? If so, I'm listening.

**bridgman** · 05 June 2011, 03:58 PM

Originally posted by allquixotic View Post

This is true, but picking your next graphics card based on what percentage of its total usable performance is being used by the open source graphics stack is certainly not silly, especially if you're a free software / open source fanatic like me.

That metric will almost always lead you towards the slowest graphics hardware. Are you sure that is what you want ?

Stupid 30 minute edit limit

Announcement

Intel Just Released A Crazy Fast Acceleration Architecture

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment