Announcement

**bridgman** · 08 January 2011, 08:17 PM

The Intel GLSL compiler goes from GLSL source to an IR (currently in a two step process, first to a compiler-specific IR (aka "GLSL IR" then converted to Mesa IR (and then to TGSI, I guess).

Jerome's shader compiler goes from TGSI to hardware instructions, ie it does the rest of the work. Similarly, the shader compiler in the r600 driver goes from Mesa IR to hardware instructions.

FYI the fglrx driver also works in two stages - the GL driver compiles GLSL / ARB_*P down to a proprietary representation (what we call "IL") and then the shader compiler goes from IL to hardware instructions in a second step.

The Intel devs are thinking about generating hardware instructions directly from GLSL IR, rather than going through Mesa IR or TGSI.

**popper** · 08 January 2011, 08:51 PM

Originally posted by bridgman View Post

The problem is that so far the test results aren't supporting our initial suspicions. Going in I think most of us suspected that the bottlenecks were likely to be in the kernel driver (synchronization, memory mapping etc..) but test results seem to suggest that common mesa code in the usermode 3D driver is a bigger factor. There's a lot more testing required though, and there are conflicting views re: how to interpret the test results so far.

Performance optimization is basically :

- run some benchmarks & save the results
repeat forever {
- do some profiling
- form a theory re: where the bottleneck is
- change some code to test the theory
- re-run the benchmarks to see if things go faster
- (4 times out of 5) curse and discard the theory (or save as the basis for a more complex theory)
- (1 time out of 5) make happy noises and get some sleep
}

WHAT, are all AMD Linux workflows this slow, and wasting lots of cycles! your obviously doing it wrong, even a NON Head of AMD manager of linux can see that

your saying you dont even have/write as yet a simple C app doing x264 type checkasm tests down to the decicycles range of all the C and assembly routines on all code ?

to see exactly what im referring to here
if you compile x264 from git
a simple
make checkasm;./checkasm
checkasm ?bench
gives you these results to help find and easily spot the real bottle necks without effort, and so decide what routines to optimise first, perhaps you could even extend this type of performance tool into all the existing Gfx kernel space C/assembly too in time, perhaps you can ask/commission on #x264dev pengvado,Dark_Shikari,holger to write you a basic x264 checkasm type app if you cant be bothered or have the time, then every one win's

**Danny** · 08 January 2011, 08:57 PM

Originally posted by spirit View Post

I confirm, on my RV620 chipset:

cat /var/log/Xorg.0.log|grep Pageflipping
7.614] (II) RADEON(0): KMS Pageflipping: enabled

What kernel version are you using? 2.6.38 i assume? Does Pageflipping make much of a difference for you?

**bridgman** · 08 January 2011, 09:03 PM

Originally posted by popper View Post

WHAT, are all AMD Linux workflows this slow, and wasting lots of cycles! your obviously doing it wrong, even a NON Head of AMD manager of linux can see that

Popper, with respect you are completely missing the point. The developers have good tools for figuring out where CPU cycles are going, but performance tuning on a graphics driver is a lot more complicated - you're dealing with a half dozen independent hardware blocks in the GPU with invisible queues between them. Performance tuning on a software implementation is much simpler... unfortunately.

**airlied** · 08 January 2011, 09:03 PM

Originally posted by popper View Post

WHAT, are all AMD Linux workflows this slow, and wasting lots of cycles! your obviously doing it wrong, even a NON Head of AMD manager of linux can see that

your saying you dont even have/write as yet a simple C app doing x264 type checkasm tests down to the decicycles range of all the C and assembly routines on all code ?

to see exactly what im referring to here
if you compile x264 from git
a simple
make checkasm;./checkasm
checkasm ?bench
gives you these results to help find and easily spot the real bottle necks without effort, and so decide what routines to optimise first, perhaps you could even extend this type of performance tool into all the existing Gfx kernel space C/assembly too in time, perhaps you can ask/commission on #x264dev pengvado,Dark_Shikari,holger to write you a basic x264 checkasm type app if you cant be bothered or have the time, then every one win's

Its quite easy to optimise a single codec, only an idiot would think this scales to anything like a generic GL stack.

Having experience in one small area of computing doesn't mean you are actually an expert.

Dave.

**RealNC** · 08 January 2011, 09:04 PM

Originally posted by bridgman View Post

Um... yes there's some head scratching going on (just like in the picture

) but the associated question is more along the lines of "the code in the open source driver looks pretty good, but it seems to run a lot slower than Catalyst and we don't know why".

If there's something I've learned after 15 years of coding is that code that looks good usually performs poorer than complicated looking code

**bridgman** · 08 January 2011, 09:28 PM

Originally posted by bridgman View Post

Popper, with respect you are completely missing the point. The developers have good tools for figuring out where CPU cycles are going, but performance tuning on a graphics driver is a lot more complicated - you're dealing with a half dozen independent hardware blocks in the GPU with invisible queues between them. Performance tuning on a software implementation is much simpler... unfortunately.

Popper, FYI the checkasm stuff you are talking about corresponds to a single line item in the workflow, "- do some profiling".

**popper** · 08 January 2011, 09:31 PM

Originally posted by airlied View Post

Its quite easy to optimise a single codec, only an idiot would think this scales to anything like a generic GL stack.

now now dave (Airlie)

, at No point did i say or imply it was/did

there's no need to take offence, or calling names because you mistakenly read things in to posts that are not there, this is a Linux User support message board after all.

Originally posted by airlied View Post

Having experience in one small area of computing doesn't mean you are actually an expert.

Dave.

again with the read things in to posts that are not there... whats making a simple performance tool to help All developers(and users alike) run and submit tests to find bottlenecks, got to do with actually an expert in whatever area

bridgman "with respect you are completely missing the point"
No .considered that.

"performance tuning on a graphics driver is a lot more complicated"

Sure, and the more complicated the code, the more reason to consider writing that performance tool covering the basic parts you can easily cover to start with, and extend it over time as more and more dev's begin to use it, after all Peter Clifton even tries to implement and improve such performance tools http://lists.freedesktop.org/archive...er/008557.html

**marek** · 08 January 2011, 09:49 PM

popper> You apparently live in the illusion that profilers or any kind of measuring tool can show you what to optimize. The graphics driver stack is huge, there are hundreds of functions that call each other, each spending very little time in itself. The right question usually isn't "how to speed up this function", it's rather something like "can I somehow change the upper layers so that this function is called less often?" Now if you start with that kind of question, you realize that if you don't know what's REALLY going on in the code at various levels, profiling is mostly USELESS and will only make you spend time on parts of code that will give you very little speedup, if any (e.g. you may end up wondering why atomic increment is so high on the profile

).

**glisse** · 08 January 2011, 09:54 PM

Originally posted by popper View Post

"performance tuning on a graphics driver is a lot more complicated"

Sure, and the more complicated the code, the more reason to consider writing that performance tool covering the basic parts you can easily cover to start with, and extend it over time as more and more dev's begin to use it, after all Peter Clifton even tries to implement and improve such performance tools http://lists.freedesktop.org/archive...er/008557.html

So you can grasp some understanding, a GL stack is several layer of software communicating through various means. A quick sketch :
1 [mesa GL->gallium->pipe driver]
2 [xorg<->ddx]
3 [kernel]
4 [GPU hw]

1 communicate with 2 through dri1/dri2 (can impact performance)
2 communicate with 3 through kernel drm api (can impact performance)
1 communicate with 3 through kernel drm api (can impact performance)
3 communicate with 4

So obviously with so many player it's hard to point a finger and no tools will help you there unless you do a tools which is capable of spying on everyone (such tools would likely be insanely complex). Bottom line is given the resource we have we would waste time on the tools that could spend today to improve perf. If you think that tools will be usefull in the future than carefully consider that any of the link btw each 4 components will likely change in non compatible way in the future and you see that unless you get enough manpower writting such tools is doomed to be outdated by the time it can gives insight.

Oh and spying on things such as dri1/dri2 and trying to evaluate it's cost and what going wrong is, if i had to guess, likely very complex to achieve without interfering in non trivial way with dri2. Same goes for all the others link, it's not easy to instrument links.

As i said elsewhere i think we have still too much CPU bottleneck to start considering GPU optimization worth while (at least if you care about slow CPU and don't have this crazy 30" screen that is starving your bandwidth). So things such as a GPU top would be of little use.

Announcement

A Big Comparison Of The AMD Catalyst, Mesa & Gallium3D Drive

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment