Announcement

**bridgman** · 08 March 2010, 01:14 PM

Our architects figured that the open source 3D stack would settle down around 60-70% of the fglrx performance on average, based on rough estimates of developer community size and priorities. The main assumptions were :

- relatively simple shader compiler (compared to the one in fglrx)

- primary focus on "making more apps run" (adding functionality) rather than "making them run faster"

- little or no optimization work for specific apps or workloads, essentially "one code path but a good one"

All indications are that the driver work is still on track to that kind of performance. It's probably running closer to 30% of fglrx performance right now but the developer focus is still almost entirely on functionality and stability not optimization.

There was some discussion about performance bottlenecks on #dri-devel over the weekend. It's probably fair to say that everyone agrees on the list of potential bottlenecks, but it's not clear which of those actually are the problem and not obvious how to determine the bottlenecks without actually coding alternative implementations for specific portions and seeing what the results are (ie big heap of work).

The immediate focus has been on understanding why the 3xx-5xx Gallium3D paths are slower than the corresponding "classic" HW driver paths. Airlied has done some work there and that brought the 300g performance closer to 300 (classic) but there are still some performance gaps which I believe are not fully understood yet.

Anyways, bottom line is that we are still expecting performance to end up around 2/3 of fglrx on average (ie maybe 2x what it is today), but the development focus right now is still on functionality and (IMO) rightly so.

**bridgman** · 08 March 2010, 01:17 PM

Originally posted by curaga View Post

Question. Why does glxgears need a shader compiler? It doesn't use any shaders.

It's true that glxgears doesn't use any app-level shaders, but it does rely on fixed-function TCL (Transform, Clipping, Lighting). Starting with r300 there *is* no fixed function TCL hardware (it's all done with programmable shader hardware), and AFAIK the same goes for most competing GPUs.

When there is no fixed-function TCL, mesa provides a vertex shader program which implements the ff operations, and that shader program needs to be compiled down to HW-specific instructions.

**bridgman** · 08 March 2010, 01:20 PM

BTW this is why airlied was able to get glxgears running on 5xx quickly back in early 2007, but other programs took a lot longer. The vertex shader hw (which was all you needed for glxgears) didn't change much between 4xx and 5xx, while the fragment shader hw changed a lot.

**V!NCENT** · 08 March 2010, 01:44 PM

Originally posted by bridgman View Post

There was some discussion about performance bottlenecks on #dri-devel over the weekend. It's probably fair to say that everyone agrees on the list of potential bottlenecks, but it's not clear which of those actually are the problem and not obvious how to determine the bottlenecks without actually coding alternative implementations for specific portions and seeing what the results are (ie big heap of work).

Hey I have not knowledge in regard to this issue, so excuse me while I make your toes 'crumble' inside of your shoes...

If the modern GPU is so programmable these days, then isn't the bottleneck not the HW itself (ignoring lack of features and speed of the GPU) but the amount of code instructions needed to execute a task?

If so (which might totally not be the case) than can't you 'simply' look at what takes the longest to execute per rendered image and 'just' shrink the amount of instruction in total? And then repeat untill there is not realy much room for optimization left?

[...], but the development focus right now is still on functionality and (IMO) rightly so.

_O_

**whizse** · 08 March 2010, 02:06 PM

The SoC page mentions that using Mesa as a state tracker in Gallium is a source of performance problems. Is this also one of the problems for r300g, or more of a theoretical problem?

SummerOfCodeIdeas

http://wiki.x.org/wiki/SummerOfCodeIdeas

**Ex-Cyber** · 08 March 2010, 02:07 PM

Originally posted by V!NCENT View Post

If the modern GPU is so programmable these days, then isn't the bottleneck not the HW itself (ignoring lack of features and speed of the GPU) but the amount of code instructions needed to execute a task?

Not exactly. In an idealized computer you could simply look at the number of instructions executed, but in a real computer (and especially any PC made in the last decade), other effects can have a huge impact on performance. It's quite easy to end up with a program that executes more instructions but runs faster, because it has a smaller cache footprint and/or more cache-friendly access patterns (for example). These things can be analyzed and measured with some degree of usefulness for a single uninterrupted thread, but in the context of a full-blown OS running real apps it's almost impossible to predict exactly what the performance will be; you just have to run the code and see what happens.

**bridgman** · 08 March 2010, 02:30 PM

Yeah... shrinking the time taken to execute the code is normally only going to give small improvements (for a lot of work). The real wins come from finding ways to not execute the code or perform the function at all. Also, a good chunk of the performance hit probably comes from not using the GPU in the most efficient way.

Tiling is a good example. Memory is normally organized in "linear" mode, so that as you move across each row you access successively higher addresses. The memory chips are organized into pages, so that accesses within the same page are faster than accesses which jump across pages.

GPUs tend not to work in nice horizontal scans though, other than when scanning the frame buffer out to the screen. Triangles tend to access a few pixels on each row, and access multiple rows in close succession. If the driver configures the GPU to use "linear" addressing (which is by far the easiest to program) then the GPU won't run as fast as it could, because it will be waiting for memory relatively more of the time.

The GPU can be programmed to use "tiled" addressing, where each memory page corresponds to one or more square/rectangular blocks on the screen / texture / whatever. This increases performance but causes problems for the parts of the graphics stack which require the CPU to access these areas. There are all kinds of solutions for this, but all add big chunks of complexity and all provide different performance tradeoffs on different applications. It's hard to tell what the best approach is without implementing them all and doing a lot of testing, but just doing that work can chew up half a developer-year.

This is one example from maybe 20 or so similar issues, unfortunately.

**bugmenot** · 08 March 2010, 03:38 PM

For me the performance isn't a big problem. It's obvious that high performance drivers need extreme much work. But what do you want with this, if the driver is not stable (squint to fglrx)...

Even the 'slow' graphics cards are fast enough today for simple 3d games and compiz etc. And that it works (stable!) and that the performance is partly okay is the most important thing. And this is on a very very good way as far as I can see.

**RealNC** · 08 March 2010, 04:15 PM

The only thing that saves Linux in this regard is that people don't actually use it to play games. And they don't expect it to play games either. That's why poor graphics performance is OK with most users.

(Note: with "games" I mean real games, like Assassin's Creed, Mass Effect and stuff that runs best on Windows and Consoles, not some amateur or old games with last-decade graphics.)

**bugmenot** · 08 March 2010, 06:39 PM

Originally posted by RealNC View Post

The only thing that saves Linux in this regard is that people don't actually use it to play games. And they don't expect it to play games either. That's why poor graphics performance is OK with most users.

(Note: with "games" I mean real games, like Assassin's Creed, Mass Effect and stuff that runs best on Windows and Consoles, not some amateur or old games with last-decade graphics.)

I play Mass Effect under wine.Other than some issues with the mouse (which can be worked around with hacks to wine) the game works fine and I get 40+ fps.
I also play Oblivion, Dragon Age : Origins, The Last Remnant, Velvet Assassin, Fallout 3.All of which are not using "last-decade graphics".I'm probably somewhat in the minority but Linux users don't just play old games.

Announcement

R600 Gallium3D Shader Compiler Milestone Hit

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment