Announcement

**dungeon** · 25 March 2017, 02:49 AM

Originally posted by indepe View Post

Recommended where?

Anywhere and proven statistically by any x86 general purpose linux distro with wide range of apps and users

Put a finger on any of these and you will see at least 2/3 or 3/4 packages built with -O2

It is standard one and all others up and down are considered special cases. King -O0 is always king but because he is slow he is considered useful in development only, -O1 better for security and that just because you still want some optimization

, -O2 standard, -O3 "safe" overclocking

, -Ofast are you crazy?

Not to mention that clever apps combine these optimizations and compile this code with this meta optimization and another code with something else even non meta and so on

**stiiixy** · 25 March 2017, 03:28 AM

Originally posted by kwahoo View Post

Where? Fusion is not stand alone, and The First Encounter HD is Windows only. Putting Linux logo on TFE HD would be a false advertisement.

I just installed a copy of Manjaro and cranked up Steam and these new versions of Serious Samwise popped up as newer options or me. It was promptly installed and ran beautifully. I then noticed the vulkan option. Radeon 6850 with the catalyst/fglrx drivers (I wasn't paying attention to the boot option when I installed) yielded a sudden view of my desktop =D

**Kano** · 25 March 2017, 03:56 AM

Michael

Why is OpenGL only tested with 4k? Everybody else tries to eliminate GPU limits in those comparisons but that way you hit em.

**linuxgeex** · 25 March 2017, 04:04 AM

RADV vs. NVIDIA Vulkan/OpenGL Performance For Serious Sam 2017 - Phoronix

http://www.phoronix.com/scan.php?page=article&item=sam2017-nvidia-vulkan&num=3

Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

So pleasant to see the R9 Fury living up to its pre-launch hype of being designed to beat the GTX 980, and even more impressive to see it doing so on an Open driver which not long ago AMD claimed the community could never possibly write because it was just too difficult, lol. 21 Months after launch, but we got there!

**linuxgeex** · 25 March 2017, 04:07 AM

Originally posted by Kano View Post

Michael

Why is OpenGL only tested with 4k? Everybody else tries to eliminate GPU limits in those comparisons but that way you hit em.

Because at high resolutions the frame rate drops, making the CPU bottleneck less of an issue, so then threaded dispatch becomes less efficient and you get better framerates with OpenGL... at least for now.

**Kano** · 25 March 2017, 04:29 AM

And where are the results that show better Vulkan speed compared to OpenGL you speak about???

**oooverclocker** · 25 March 2017, 06:20 AM

Originally posted by efikkan View Post

What kind of overhead?

(Mostly) CPU- Overhead.

Originally posted by efikkan View Post

"Draw calls" has nothing to do with which cluster will do the computation on the GPU.

But it has something to do with how many clusters can be stressed and when they are not stressed one might think about how to stress them more, perhaps by dividing the calculations into more but smaller parts or how to move CPU- calculations like physics and so on to the GPU to stress the remaining shaders.

Originally posted by efikkan View Post

Theoretical computational performance is just a number to measure the calcualtion throughput of a GPU when it's fully saturated with floating point calculations

And if that's not always the case before the result comes out then something doesn't work optimally. So that's the point where it has to improve.

Originally posted by bridgman View Post

it's never been clear to me why anyone would expect graphics performance to be determined by shader core throughput but not by any of the other subsystems that contribute to graphics rendering.

Originally posted by artyom.h31 View Post

RX 480 has a slower ROP. That's why it shows less FPS than GTX 1060 in many games IMHO. [...] I've heard that Vega will obtain a revamped ROP module.

That might be right, but it only explains some specific scenarios in my opinion. It might indicate lower FPS with very low details but very high resolutions(->When the shaders compute many FPS with many pixels/vertices but the ROP throughput can't handle it.) So when a Titan X runs with 500 FPS and a GTX 1060 runs with 250 in an older OpenGL title that's pretty surely the described scenario. Compared with past cards the ROP count of 32 with ~1300MHz looks reasonable to me for 6 TFLOPs unless you want to spend resources to achieve high FPS in older games that no one needs. If the ROPs would be the a problem for the RX 480 the Fury with 64 ROPs should be faster in the situations where we see limiting performance. That's usually not the case for OpenGL benchmarks.
Even when the ROPs had an impact a 1080p scenario with Ultra details in a AAA title @60-100 FPS the RX 480 should still be 25% faster than a GTX 1060 because then the ROPs shouldn't really matter.

As I wrote before, there are also other factors that basically have to do with internal load management but as long as the implementation isn't extremely out of balance I'm quite sure that one could do more optimization to relieve them. Perhaps it will not be possible to stress a Fury X near it's maximum performance in average with nearly optimal drivers etc. but the RX 480 looks pretty reasonably designed to me.

**efikkan** · 25 March 2017, 06:50 AM

Originally posted by dungeon View Post

GCC's -O3 is known to improve and break performance, but some people even think that is the rule of thumb and only better perf come with that

But no optimization is safe, -O2 consist of collection of optimizations which are on average considered safe but even that might break something

The scope of a compiler is pretty narrow, as we all know it's unable to understand your code, so it all works by finding patterns and trying to replace them with better patterns. If you end up with problems it may have one of two causes; stupidity in your code or compiler confusion, usually it's the first. In the event you find a bug in the compiler, you should submit a bug report. I've yet to run across any such problems in my own code.

If you are unable to use O3, you can still selectively use some of the optimizations. Like, e.g. -finline-functions, which will help reducing code cache misses.

Originally posted by oooverclocker View Post

(Mostly) CPU- Overhead.

This is a very artificial benchmark, but still RX 480 does pretty well here.

Originally posted by oooverclocker View Post

But it has something to do with how many clusters can be stressed and when they are not stressed one might think about how to stress them more, perhaps by dividing the calculations into more but smaller parts or how to move CPU- calculations like physics and so on to the GPU to stress the remaining shaders.

As a matter of fact, fewer API calls with larger batches of data is better, since the GPU can more easily determine data dependencies.

You are mixing a lot of different things here. Moving physics calculations from CPU to GPU is of course good, but as you clearly can see the same games fully stress a GTX 1070, GTX 1080 and GTX 1080 Ti, so it's not a CPU bottleneck here.

Originally posted by oooverclocker View Post

And if that's not always the case before the result comes out then something doesn't work optimally. So that's the point where it has to improve.

That doesn't make any sense at all.
Back to your point; the ratio between gaming performance vs. theoretical computational performance tells us more about how balanced the hardware is for gaming, rather than how well the driver is written, since these are usually caused by hardware limitations. If you e.g. look at Fury X vs. GTX 980 Ti, GTX 980 Ti clearly is better at gaming, yet Fury X has vastly more computational performance. And there are certain workloads that fits this configuration, rendering is just no one of them.

Originally posted by oooverclocker View Post

If the ROPs would be the a problem for the RX 480 the Fury with 64 ROPs should be faster in the situations where we see limiting performance. That's usually not the case for OpenGL benchmarks.
Even when the ROPs had an impact a 1080p scenario with Ultra details in a AAA title @60-100 FPS the RX 480 should still be 25% faster than a GTX 1060 because then the ROPs shouldn't really matter.

How can you even say something like that? Rasterizing in Pascal and Maxwell works quite differently from Polaris, even the cache structure is different, so the load on ROPs, Flop/s, memory bandwidth, etc. is not directly comparable.

**SpyroRyder** · 25 March 2017, 07:13 AM

Originally posted by tomtomme View Post

if you click on that, you see the missing linux icons

Its not really a stand alone game as such. All the current bundles include one of the "Windows only" variants so it displays only that icon. It probably would have more sense to just push these out as an update to the base games but i suppose they yad some good reason forvdoing it this way.

**leipero** · 25 March 2017, 04:12 PM

Originally posted by Michael View Post

AMDGPU-PRO doesn't work yet on Ubuntu 16.10+....

Man i love Ubuntu, my first GNU/Linux system, used it for so long, but for your purpose (reviews, tests etc.) i think Arch would do much better job. I might be wrong, but as an regular user, I'm not even thinking of coming back to Ubuntu, still love it and recommend it tho, and even use it on 2nd machine.

Announcement

RADV vs. NVIDIA Vulkan/OpenGL Performance For Serious Sam 2017

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment