Originally posted by Tomin
View Post
Announcement
Collapse
No announcement yet.
Marek Threads RadeonSI Gallium3D, Big Performance Gains For Many Games
Collapse
X
-
Originally posted by schmidtbag View PostI've had the impression that CPU was a frequent bottleneck to the AMD drivers for a while now, so this should be a very welcome boost and help close the performance gap. Makes me wonder if this actually removes the CPU bottleneck or just reduces it.
The more you do better threading, the more you open up can on worms i guessLast edited by dungeon; 11 May 2017, 11:14 AM.
Comment
-
Originally posted by pal666 View Postincorrect. you just said that when your progam is
dostuff1
dostuff2
it is more advantageous to speedup dostuff1. in reality you have to speedup both
As I said before, it's not about doing one thing after another but about layers. If you split your work between synchronous execution and asynchronous (threaded) execution, everything after the split will happen asynchronously. If you do the split inside OpenGL layer, everything after (OpenGL state-tracker, gallium pipe-driver) will happen in another thread. If you do the split in gallium then only the pipe-driver stuff will be offloaded.
In other words, by doing it earlier you can offload some of inherent OpenGL validation and state tracking in addition to also offloading pipe-driver operation.
Comment
-
Originally posted by Zan Lynx View PostMy gaming laptop has an unplugged mode that limits itself to 30 fps. If I turn that off it is quite likely to hit the system power limit. It already tries to burn itself up using about 60 W. That's in Windows.
It'd be nice if Linux had a way to set a system wide FPS limit for battery savings. All the way down to 20.
Comment
-
Originally posted by schmidtbag View PostIt should be possible, though maybe not totally convenient. In your xorg.conf, you can try setting your display refresh rate to 30Hz and under the "Device" section, use:
Option "TearFree" "on"
This should, in theory, force system-wide vsync at 30FPS. The inconvenience of this is you'd have to reboot or restart xserver every time you want to swap frame rates.
Comment
-
Originally posted by dungeon View PostJust reduction of course, as other parts of the system also makes some CPU bottlenecks... i am sure that Composite even when not used eat some resources, disabled might provide about +15% or something here and there or if highly used it can make severe slowdowns here and there...
The more you do better threading, the more you open up can on worms i guess
When looking at many benchmarks Michael does, often you'll find some AMD GPUs weirdly under-performing, or sometimes capping out at a frame rate regardless of how much better one chip is from another. He tests with pretty high-end CPUs, so this often leads me to believe that one of the main performance flaws in AMD drivers is CPU bottlenecking. I'm personally wondering if this multi-threading would divide enough of the workload to eliminate the driver as being a bottleneck in these particular tests. Obviously there's only so much that can be done to prove this (for example, some games could just be poorly designed) but what I'm wondering is if this reduction will be enough that we get to see the true potential of the hardware and/or drivers in tests where a CPU core was maxed out.
Comment
-
Originally posted by schmidtbag View PostI agree and understand your statements, though I don't think I made my point very clear:
When looking at many benchmarks Michael does, often you'll find some AMD GPUs weirdly under-performing, or sometimes capping out at a frame rate regardless of how much better one chip is from another. He tests with pretty high-end CPUs, so this often leads me to believe that one of the main performance flaws in AMD drivers is CPU bottlenecking. I'm personally wondering if this multi-threading would divide enough of the workload to eliminate the driver as being a bottleneck in these particular tests. Obviously there's only so much that can be done to prove this (for example, some games could just be poorly designed) but what I'm wondering is if this reduction will be enough that we get to see the true potential of the hardware and/or drivers in tests where a CPU core was maxed out.
Comment
-
Originally posted by phoronix View Postand 27% better performance in OpenArena
And in case you're about to say "you can't see any more than 60fps", see:
Commonly used values for com_maxfps include 43, 76, 125 and even 333 because these represent sweet spots where the in-game physics (in case of "frame rate dependent physics" servers) are most advantageous to the player: the player can make higher jumps and consequentially can develop higher speed more quickly by strafing.
Comment
-
Originally posted by Wielkie G View Post
You're wrong.
As I said before, it's not about doing one thing after another but about layers.
Originally posted by Wielkie G View PostIf you split your work between synchronous execution and asynchronous (threaded) execution, everything after the split will happen asynchronously.
Originally posted by Wielkie G View PostIf you do the split inside OpenGL layer, everything after (OpenGL state-tracker, gallium pipe-driver) will happen in another thread. If you do the split in gallium then only the pipe-driver stuff will be offloaded.
Originally posted by Wielkie G View PostIn other words, by doing it earlier you can offload some of inherent OpenGL validation and state tracking in addition to also offloading pipe-driver operation.Last edited by pal666; 12 May 2017, 06:18 PM.
Comment
-
Originally posted by pal666 View Posti am right. one layer does its thing then other layer does its thing
Originally posted by pal666 View Postno. if you split then you have two parts and you can continue splitting them
Originally posted by pal666 View Postif you split, everything before split will be in one thread and everything after split will be in other thread. after another split you will get three threads and so on. it is called pipelining and for example every cpu does it many(some dozens) times
Sure, if you have a lot of threads to spare then under very specific circumstances you could gain a little bit performance from doing a second split. But I would assume that the inverse happens more frequently - especially if you don't possess a multi-core CPU with weak per-core performance and the application is not lightly-threaded (so you don't steal CPUs for your pipeline that the application could use in a wiser way).
Originally posted by pal666 View Postby doing it earlier you get less work for first thread and more work for second thread. while your real goal is to have many threads with little work in each
- Likes 1
Comment
Comment