Announcement

**tmikov** · 03 November 2012, 03:53 PM

Originally posted by pingufunkybeat View Post

In your case, it is about 4ms faster at rendering a single texture.

Optimising 4ms away is really hard work, especially if it consists of 100 different miliseconds collected across different parts of the driver. That's what my armchair response was about.

4 Ms is really a very long time, a huge amount of CPU instructions. Plus CPU utilization is not high. This is not about code optimization. I am certain it is not a hundred little things, it must be a couple big ones.

**Rigaldo** · 03 November 2012, 04:15 PM

Originally posted by pingufunkybeat View Post

In your case, it is about 4ms faster at rendering a single texture.

Optimising 4ms away is really hard work, especially if it consists of 100 different miliseconds collected across different parts of the driver. That's what my armchair response was about.

Please elaborate on what optimizations should be done on rendering a single image? In such a simple process, pretty much the same "calls" to the gpu should be made, no?
The driver's bottleneck is the CPU, it's where it works as a program, no? But low CPU usage, pretty much eliminates this possibility for this case. So GPU is the bottleneck. So something must be wrong there. Such a simple case, indicates that something is done wrong, a speed up feature not used or extra/different usage of the GPU is done. And it doesn't seem like many small ones, more like a couple bigger ones as mentioned. I doubt it took AMD 15 years to optimize rendering a texture(one). One possibility, may be invalid though, could it have to do with texture compression? Test it with a simple gradient instead and see there tooPMtmikov .. xD

**pingufunkybeat** · 03 November 2012, 04:37 PM

Originally posted by Rigaldo View Post

Please elaborate on what optimizations should be done on rendering a single image? In such a simple process, pretty much the same "calls" to the gpu should be made, no?

Like I said, this is for driver developers to answer, I lack the knowledge. Marek and Alex have already written that all hardware functionality is used (only HiZ is not on by default). If I remember correctly, Jerome Glisse did profile the driver and couldn't find one single bottleneck, but many small ones. I can't find a link at the moment, perhaps somebody is better at googling.

The driver's bottleneck is the CPU, it's where it works as a program, no? But low CPU usage, pretty much eliminates this possibility for this case.

Only if they operate completely asynchronously. If the GPU ever has to wait for the driver before continuing, then no.

Even if your processor is mostly idle, a simple cache miss might cause a considerable delay while your GPU is waiting for the next instruction.

But again, I'm not a GPU developer. I just don't believe that using less than 100% of CPU all the time means that there are no bottlenecks in the driver. A 1ms delay is a 1ms delay, even if it only happens occasionally.

**jakubo** · 03 November 2012, 04:45 PM

Originally posted by pingufunkybeat View Post

Like I said, this is for driver developers to answer, I lack the knowledge. Marek and Alex have already written that all hardware functionality is used (only HiZ is not on by default). If I remember correctly, Jerome Glisse did profile the driver and couldn't find one single bottleneck, but many small ones. I can't find a link at the moment, perhaps somebody is better at googling.

Only if they operate completely asynchronously. If the GPU ever has to wait for the driver before continuing, then no.

Even if your processor is mostly idle, a simple cache miss might cause a considerable delay while your GPU is waiting for the next instruction.

But again, I'm not a GPU developer. I just don't believe that using less than 100% of CPU all the time means that there are no bottlenecks in the driver. A 1ms delay is a 1ms delay, even if it only happens occasionally.

arent open source drivers single threaded? i wouldnt wonder that the cpu isnt fully used. would be interesting to see when the cpu would run in single core mode and then compare to catalyst...

**agd5f** · 03 November 2012, 04:54 PM

2D tiling should show a bigger improvement on bigger cards since they have more memory channels.

**glisse** · 03 November 2012, 05:20 PM

Originally posted by tmikov View Post

Fair enough. But what would be a good synthetic stress test?

Also, do you have an idea why the blob is faster? Could it be memory clocks, power management, etc?

This mesa demo fill probably have double perf with 2d tiling on (depending on the GPU has the high end GPU benefit more from 2D tiling)

fill.c « perf « src - mesa/demos - A collection of OpenGL / Mesa demos and test programs. (mirrored from https://gitlab.freedesktop.org/mesa/demos)

http://cgit.freedesktop.org/mesa/demos/tree/src/perf/fill.c

There is not a single thing that explain the gap btw open source driver and closed source driver. There is no secret way to do thing, we have tools to capture fglrx command stream and there is nothing fundamentally different. Proper power management support with use of on chip governor to manage clock will probably improve performance a bit, better buffer heuristic placement, better shader compiler, less cpu overhead, less cp stalling, ... many little things like those add up.

**curaga** · 03 November 2012, 05:35 PM

@tmikov

Tried to profile it yet? Oprofile will show the cpu use, radeontop gpu use, latencytop any waits.

**Rigaldo** · 03 November 2012, 06:11 PM

Originally posted by pingufunkybeat View Post

Like I said, this is for driver developers to answer, I lack the knowledge. Marek and Alex have already written that all hardware functionality is used (only HiZ is not on by default). If I remember correctly, Jerome Glisse did profile the driver and couldn't find one single bottleneck, but many small ones. I can't find a link at the moment, perhaps somebody is better at googling.

Only if they operate completely asynchronously. If the GPU ever has to wait for the driver before continuing, then no.

Even if your processor is mostly idle, a simple cache miss might cause a considerable delay while your GPU is waiting for the next instruction.

But again, I'm not a GPU developer. I just don't believe that using less than 100% of CPU all the time means that there are no bottlenecks in the driver. A 1ms delay is a 1ms delay, even if it only happens occasionally.

Neither am I a GPU developer of course. At best, I've tried programming with OpenGL, so even there I know just a few basics. But I still don't think it's s few small things here and there. And someone may know better whether compression can play any part in the specific benchmark of course, cause if it can, we likely found the culprit on the "single texture test" and maybe radeon is doing fine, who knows.

**GreatEmerald** · 03 November 2012, 06:28 PM

This makes me wonder... It's just a wild guess, but are you making sure that the GPU is trying to render only your texture, and nothing else? If the test was running like glxgears, then the difference in performance could very well be due to the GPU having to render all the windows in the background and such as well as the test object. And even if it is running fullscreen, are there any guarantees that the GPU is not trying to render something in the background or offscreen, before drawing the test texture on top?

**zxy_thf** · 03 November 2012, 10:59 PM

Originally posted by GreatEmerald View Post

This makes me wonder... It's just a wild guess, but are you making sure that the GPU is trying to render only your texture, and nothing else? If the test was running like glxgears, then the difference in performance could very well be due to the GPU having to render all the windows in the background and such as well as the test object. And even if it is running fullscreen, are there any guarantees that the GPU is not trying to render something in the background or offscreen, before drawing the test texture on top?

Try "sudo init 3" and using GLES

Announcement

Radeon Gallium3D R600g Color Tiling Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment