Announcement

**schmidtbag** · 11 May 2017, 10:41 AM

Originally posted by Tomin View Post

Couldn't he change the refresh rate using xrandr without restarting X? I think it should allow that, but I'm not sure, since I've never used it for that purpose.

I suppose after adding the 30Hz refresh rate to the xorg.conf, you could use xrandr or any GUI for it. I haven't tried it myself.

**dungeon** · 11 May 2017, 11:09 AM

Originally posted by schmidtbag View Post

I've had the impression that CPU was a frequent bottleneck to the AMD drivers for a while now, so this should be a very welcome boost and help close the performance gap. Makes me wonder if this actually removes the CPU bottleneck or just reduces it.

Just reduction of course, as other parts of the system also makes some CPU bottlenecks... i am sure that Composite even when not used eat some resources, disabled might provide about +15% or something here and there or if highly used it can make severe slowdowns here and there...

The more you do better threading, the more you open up can on worms i guess

**Wielkie G** · 11 May 2017, 11:25 AM

Originally posted by pal666 View Post

incorrect. you just said that when your progam is
dostuff1
dostuff2
it is more advantageous to speedup dostuff1. in reality you have to speedup both

You're wrong.

As I said before, it's not about doing one thing after another but about layers. If you split your work between synchronous execution and asynchronous (threaded) execution, everything after the split will happen asynchronously. If you do the split inside OpenGL layer, everything after (OpenGL state-tracker, gallium pipe-driver) will happen in another thread. If you do the split in gallium then only the pipe-driver stuff will be offloaded.

In other words, by doing it earlier you can offload some of inherent OpenGL validation and state tracking in addition to also offloading pipe-driver operation.

**MaxToTheMax** · 11 May 2017, 11:50 AM

Originally posted by Zan Lynx View Post

My gaming laptop has an unplugged mode that limits itself to 30 fps. If I turn that off it is quite likely to hit the system power limit. It already tries to burn itself up using about 60 W. That's in Windows.

It'd be nice if Linux had a way to set a system wide FPS limit for battery savings. All the way down to 20.

I think you could set the monitor refresh rate that low manually using xorg hacks and then use a compositing WM.

**agd5f** · 11 May 2017, 12:00 PM

Originally posted by schmidtbag View Post

It should be possible, though maybe not totally convenient. In your xorg.conf, you can try setting your display refresh rate to 30Hz and under the "Device" section, use:
Option "TearFree" "on"
This should, in theory, force system-wide vsync at 30FPS. The inconvenience of this is you'd have to reboot or restart xserver every time you want to swap frame rates.

You can set the refresh rate per display on the fly using xrandr. Same with tear free (with a new enough amdgpu ddx).

**schmidtbag** · 11 May 2017, 01:11 PM

Originally posted by dungeon View Post

Just reduction of course, as other parts of the system also makes some CPU bottlenecks... i am sure that Composite even when not used eat some resources, disabled might provide about +15% or something here and there or if highly used it can make severe slowdowns here and there...

The more you do better threading, the more you open up can on worms i guess

I agree and understand your statements, though I don't think I made my point very clear:
When looking at many benchmarks Michael does, often you'll find some AMD GPUs weirdly under-performing, or sometimes capping out at a frame rate regardless of how much better one chip is from another. He tests with pretty high-end CPUs, so this often leads me to believe that one of the main performance flaws in AMD drivers is CPU bottlenecking. I'm personally wondering if this multi-threading would divide enough of the workload to eliminate the driver as being a bottleneck in these particular tests. Obviously there's only so much that can be done to prove this (for example, some games could just be poorly designed) but what I'm wondering is if this reduction will be enough that we get to see the true potential of the hardware and/or drivers in tests where a CPU core was maxed out.

**marek** · 11 May 2017, 01:27 PM

Originally posted by schmidtbag View Post

I agree and understand your statements, though I don't think I made my point very clear:
When looking at many benchmarks Michael does, often you'll find some AMD GPUs weirdly under-performing, or sometimes capping out at a frame rate regardless of how much better one chip is from another. He tests with pretty high-end CPUs, so this often leads me to believe that one of the main performance flaws in AMD drivers is CPU bottlenecking. I'm personally wondering if this multi-threading would divide enough of the workload to eliminate the driver as being a bottleneck in these particular tests. Obviously there's only so much that can be done to prove this (for example, some games could just be poorly designed) but what I'm wondering is if this reduction will be enough that we get to see the true potential of the hardware and/or drivers in tests where a CPU core was maxed out.

There are 3 factors: the CPU, the GPU, and the application. We can talk about whether performance has improved in terms of that triple. I don't think it can be answered generally.

**danieru** · 11 May 2017, 01:52 PM

Originally posted by phoronix View Post

and 27% better performance in OpenArena

Sweet! Sweet! Sweet! So if i got 300fps I could now get 381fps in openarena? So awesome! Currently I play at 125fps with nouveau, but i bet a nice vega could get me to 333fps

And in case you're about to say "you can't see any more than 60fps", see:

Commonly used values for com_maxfps include 43, 76, 125 and even 333 because these represent sweet spots where the in-game physics (in case of "frame rate dependent physics" servers) are most advantageous to the player: the player can make higher jumps and consequentially can develop higher speed more quickly by strafing.

**pal666** · 12 May 2017, 06:16 PM

Originally posted by Wielkie G View Post

You're wrong.

As I said before, it's not about doing one thing after another but about layers.

i am right. one layer does its thing then other layer does its thing

Originally posted by Wielkie G View Post

If you split your work between synchronous execution and asynchronous (threaded) execution, everything after the split will happen asynchronously.

no. if you split then you have two parts and you can continue splitting them

Originally posted by Wielkie G View Post

If you do the split inside OpenGL layer, everything after (OpenGL state-tracker, gallium pipe-driver) will happen in another thread. If you do the split in gallium then only the pipe-driver stuff will be offloaded.

if you split, everything before split will be in one thread and everything after split will be in other thread. after another split you will get three threads and so on. it is called pipelining and for example every cpu does it many(some dozens) times

Originally posted by Wielkie G View Post

In other words, by doing it earlier you can offload some of inherent OpenGL validation and state tracking in addition to also offloading pipe-driver operation.

by doing it earlier you get less work for first thread and more work for second thread. while your real goal is to have many threads with little work in each

**Wielkie G** · 13 May 2017, 06:27 AM

Originally posted by pal666 View Post

i am right. one layer does its thing then other layer does its thing

The layers do not exist in vacuum. Each one depends on another. You cannot really think of them as separate beings.

Originally posted by pal666 View Post

no. if you split then you have two parts and you can continue splitting them

Sure you can, though I don't see much of a value in that.

Originally posted by pal666 View Post

if you split, everything before split will be in one thread and everything after split will be in other thread. after another split you will get three threads and so on. it is called pipelining and for example every cpu does it many(some dozens) times

The synchronization costs between these separate threads will murder you. That is one of the reasons software pipelining implemented using separate threads is not the best idea. You just burn one thread after another. If you really want to make use of more threads, feeding a thread pool would be necessary, though that requires the workload to be parallelizable. The threaded gallium/OpenGL work is created in order to hide the synchronous API overhead (synchronous validation, state tracking, etc). Due to synchronous nature, you cannot really do much (and that is one of the reasons Vulkan applications benefit more from more CPU cores).

Sure, if you have a lot of threads to spare then under very specific circumstances you could gain a little bit performance from doing a second split. But I would assume that the inverse happens more frequently - especially if you don't possess a multi-core CPU with weak per-core performance and the application is not lightly-threaded (so you don't steal CPUs for your pipeline that the application could use in a wiser way).

Originally posted by pal666 View Post

by doing it earlier you get less work for first thread and more work for second thread. while your real goal is to have many threads with little work in each

Doing less work in the first thread (i.e. the application OpenGL thread) is what the threaded gallium/OpenGL is mainly about. In many applications, this thread is already overloaded by a lot of work needed to create needed OpenGL API calls. Making these calls as cheap as possible is what is providing the gains we see.

Announcement

Marek Threads RadeonSI Gallium3D, Big Performance Gains For Many Games

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment