Announcement

**jakubo** · 09 January 2011, 09:59 AM

another question that leaps to my mind...
if the compiler benchmarks yield such different performances,
how can you tell theres actually a fault in the driver, and not a fault in the compiler, which made the driver?
would it make a difference if one would make a driver compiled with gcc 4.6 and another attempt with 4.3 and a third with llvm for instance (if possible).
i just wonder if it is influenced by it, and if the compilers are really to be relied on. (anologously concerning compiler optimisation levels)

**yotambien** · 09 January 2011, 11:38 AM

Originally posted by bridgman

As time permits we are trying to dig into how the Catalyst driver state change logic is coded and see if there are ideas which can be applied to the open drivers. The open source driver code seems pretty efficient though, which is why there is some head-scratching going on.

Originally posted by bridgman

The unhappy thing about performance improvement is that you don't get big performance gains from one place -- you get gains in the 1-5% range from each of a number of areas, and each of those gains requires a lot of work and makes the code more complex to maintain and troubleshoot in the future.

bridgman, aren't those two statements somewhat contradictory when applied to the current state of development of the OSS drivers? What I mean is, if the 80/20 rule applies here (and I don't really know whether it does), the poor performance of the drivers as compared to fglrx would indicate that you guys are still far from the "micro-optimisations" stage (lacking another term). If there is some head-scratching going on and your understanding and gut feelings are right, one of these days somebody may discover a relatively simple and single performance killer in the stack, right? Or are you more inclined to think that it is the combination of a hundred tiny little issues adding up what's at play here? Based mostly on your commments, I always had the impression that the OSS drivers should reach well more than 50% the performance of fglrx, the remaining being caused by quite time-consuming optimisations the community would not understandably have the resources to commit. Would you say this is an accurate view?

Given that the current driver arquitecture is quite new, one would expect it to be modern, solid and fit for purpose (not perfect, sure). Challenging this idea, I remember somebody (Glisse, perhaps) mentioning that Gallium would/should be written differently had it been designed today...?

**bridgman** · 09 January 2011, 12:54 PM

Good question. Regarding getting from 25% to 50%, I expect that will come from a combination of (a) performance-related features and options that haven't been enabled yet, or which are not yet enabled by default and (b) a few things along the lines of what marek described above.

There is still a decent chance that some of those changes will give more than 5% improvement so maybe the range should go up to 15% for the next little while, but I don't *think* there's going to be a single performance killer whose removal doubles the driver speed across the board. I could be wrong though...

**madbiologist** · 09 January 2011, 06:32 PM

I am surprised that nobody, neither in the original article, nor here in this forum thread, has mentioned the significant improvements since the previous R600g benchmarks published by Phoronix on November 22, 2010.

* VDrift is now working with Gallium3D on the HD5770.
* World of Padman is now working with Gallium3D on the HD5770.
* approximately 14% speedup in World of Padman on HD4870 running Gallium3D at 1920x1080.
* approximately 22% speedup in Nexuiz on HD4870 running Gallium3D at 1024x768.
* approximately 85% speedup in Nexuiz on HD4870 running Gallium3D at 1920x1080.

The first lot of benchmarks were made with the Linux 2.6.37-rc2 kernel, xf86-video-ati 6.13.99 Git DDX, and Mesa 7.10-devel / Gallium3D code from Git on 2010-11-18.

The second lot of benchmarks were made with the latest Linux 2.6.37 kernel development code, libdrm, xf86-video-ati DDX (version 6.13.99), and Mesa 7.10-devel / Gallium3D code from Git on 2010-12-25.

**agd5f** · 09 January 2011, 06:47 PM

The following hw features should add fairly major performance improvements depending on the app:
- enable 2D tiling for textures, DB, and CBs on 6xx/7xx (ddx, r600g)
- add tiling support for evergreen and NI (drm, ddx, r600g)
- enable hyperZ features on 6xx-NI (r600g)
- fast DB/CB clears (r600g)

**Pfanne** · 09 January 2011, 06:57 PM

Originally posted by agd5f View Post

The following hw features should add fairly major performance improvements depending on the app:
- enable 2D tiling for textures, DB, and CBs on 6xx/7xx (ddx, r600g)
- add tiling support for evergreen and NI (drm, ddx, r600g)
- enable hyperZ features on 6xx-NI (r600g)
- fast DB/CB clears (r600g)

what percentages are we talking about?
5%, 10% or more like 50%?
just rough guess.

**Wyatt** · 09 January 2011, 08:05 PM

Originally posted by bridgman View Post

The Intel GLSL compiler goes from GLSL source to an IR (currently in a two step process, first to a compiler-specific IR (aka "GLSL IR" then converted to Mesa IR (and then to TGSI, I guess).

Jerome's shader compiler goes from TGSI to hardware instructions, ie it does the rest of the work. Similarly, the shader compiler in the r600 driver goes from Mesa IR to hardware instructions.

FYI the fglrx driver also works in two stages - the GL driver compiles GLSL / ARB_*P down to a proprietary representation (what we call "IL") and then the shader compiler goes from IL to hardware instructions in a second step.

The Intel devs are thinking about generating hardware instructions directly from GLSL IR, rather than going through Mesa IR or TGSI.

It strikes me as a bit odd that there are so many IR translations to go through in the current stack.
You say GLSL Source -> Compiler IR -> Mesa IR -> TGSI -> Hardware, right?

I gather that a lot of the performance issues are (probably?) currently on CPU, so I would think that removing some stages would be a good idea (as the Intel devs are thinking). What sort of factors (technical/political/security) have prevented a unification of this to one or two steps instead of five?

There's an optimisation step at some point, so there's an IR translation (GL shader compiler-specific), hence the two stage process even in fglrx, but the rest of them seem to just be passing things around for the sake of it, incurring translation overhead each time for a functionally equivalent unit. Perhaps the ongoing existence of UMS/Classic drivers is at fault? (Yes, I realise I'm probably missing something really important here.)

Do the docs from ATI have any information on the IL used in fglrx? You seem to get pretty good performance, so in my limited experience with compiler development, it would seem like a good idea to learn from the specialist in the domain (Thought: A generic IR with driver-specific extensions where necessary?)

**RealNC** · 09 January 2011, 08:11 PM

Aren't shaders computed only once during loading? If yes, wouldn't the performance improvement be rather irrelevant?

**smitty3268** · 09 January 2011, 08:12 PM

Originally posted by Wyatt View Post

It strikes me as a bit odd that there are so many IR translations to go through in the current stack.
You say GLSL Source -> Compiler IR -> Mesa IR -> TGSI -> Hardware, right?

I gather that a lot of the performance issues are (probably?) currently on CPU, so I would think that removing some stages would be a good idea (as the Intel devs are thinking). What sort of factors (technical/political/security) have prevented a unification of this to one or two steps instead of five?

There's an optimisation step at some point, so there's an IR translation (GL shader compiler-specific), hence the two stage process even in fglrx, but the rest of them seem to just be passing things around for the sake of it, incurring translation overhead each time for a functionally equivalent unit. Perhaps the ongoing existence of UMS/Classic drivers is at fault? (Yes, I realise I'm probably missing something really important here.)

Do the docs from ATI have any information on the IL used in fglrx? You seem to get pretty good performance, so in my limited experience with compiler development, it would seem like a good idea to learn from the specialist in the domain (Thought: A generic IR with driver-specific extensions where necessary?)

The Mesa IR is just what the old compiler generated, and was used by all the classic drivers. When Gallium came out, they apparently thought it had some limitations and decided to base everything around TGSI instead, and to get everything working they just converted the Mesa IR => TGSI rather than rewriting the compiler to generate it directly. (The Mesa IR code would have to remain anyway for all the classic drivers to work).

When Intel created their new GLSL compiler, they did it with the intention of having a good IR come out of it that they could use directly - essentially their own version of Mesa IR and TGSI, only i guess it contains more info from the original program that they wanted to keep around for optimization purposes. So their new drivers are trying to use that directly instead of converting to Mesa IR and using that. It could also replace TGSI for Gallium, and the drivers could all use that directly, but no one really seems to be working towards that goal right now. There was someone who wanted to experiment with using LLVM in the middle instead, and everyone else seems to busy working on the individual drivers to try and mess with the basic Gallium APIs.

**HokTar** · 09 January 2011, 08:14 PM

The rate of speed improvement certainly seems promising. Thank you all!

My question is: what are the plans, how far are you planning to do the optimisations and if then when do you plan to start working on other features?

I mean video decoding, killer power management, opencl, etc.

Also, from bridgman's latest posts it seems to me that mesa might cause significant performance penalties. Are there any plans to write an ogl state tracker for gallium? I understand that it's not going to be the radeon devs to do that, but still.

Thanks for your time and efforts guys, I really appreciate it!

Announcement

A Big Comparison Of The AMD Catalyst, Mesa & Gallium3D Drive

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment