Originally posted by philips
View Post
Announcement
Collapse
No announcement yet.
NVIDIA/Radeon Windows 10 vs. Ubuntu Linux Relative Gaming Performance
Collapse
X
-
Again not terrible results, Talos and CIV seem to be the games that are suffering the most, others such as Deus Ex come down to a poorly optimized game for the most part. I think Linux as a gaming platform is becoming a very real reality and perhaps in another 12 months we will get there, it will help when VR becomes available (its in beta now) however I'm still waiting for VRhelmets to mature.
I think the graphics could have been just normalized with the REAL frame-rates instead, changing it to a 1.0 normalization just cuts out 1 bit of information while using FPS as normalization would also give us a perspective of what those platforms achieve.Last edited by theriddick; 21 February 2017, 08:56 PM.
Comment
-
Originally posted by gamerk2 View Post
Windows handles this case *better* as of Vista; Windows at least checks the CPUID flags to determine if the processor uses HTT, and if so, tries to put Kernel threads on HTT cores to avoid bumping user-mode threads. Granted, this doesn't always work well, but it's handled a lot better then it was in the Pentium 4 days [where Windows WOULD act in exactly the way you describe].
This is partly why AMD's CMT stank out of the box; Windows didn't see a CPUID flag for HTT, so it treated all the cores in AMDs CPUs equally. As we now know, there's about a 20-25% performance loss as part of CMT, which ate into performance at launch. This was eventually mitigated via a Kernel patch, which essentially treated Bulldozer based CPUs and it's decedents like Hyperthreaded CPUs for the purposes of thread scheduling.
All I'm trying to say is that if you take a software load that issues 4 threads and ran it on a 4C/8T processor you will get better performance on Linux than you will on Windows even today. This is where it gets fucked up, on SMT architectures threads 5-8 are not real processors and Windows even still today doesn't give a single little shit about that. It's provable right now by simply benchmarking 4 threads on a 4C/4T SMT disabled configuration and a 4C/8T SMT enabled configuration.
EDIT: What I'm saying is bencharking 4 threads on 4C/4T vs 4 threads on 4C/8T aka SMT disabled vs SMT enabled on Windows and then again on Linux will show exactly what I'm talking about.
EDIT:It may be possible to get additional performance out of an SMT pipeline in cases where there are available integer units,which can increase overall performance. However that second thread then pulls down the performance of the first thread in cases where the total demand for integer units is larger than the number available. In todays modern age most X86 processors can extract between 2-3 integer operations per cycle, but they only have 4 integer units per pipeline, which means that at -every- cycle there will be between 1-2 instructions lost. If you want the best possible per thread performance then you really do need to make sure the number of threads issued is -less- than the number of -actual- cores your processor has. If you are on Linux that's it, but if you are on Windows you additionally need to make sure SMT is disabled.
EDIT: Also, Windows doesn't exactly "schedule" threads per se. This is what I mean.
The system treats all threads with the same priority as equal. The system assigns time slices in a round-robin fashion to all threads with the highest priority. If none of these threads are ready to run, the system assigns time slices in a round-robin fashion to all threads with the next highest priority. If a higher-priority thread becomes available to run, the system ceases to execute the lower-priority thread (without allowing it to finish using its time slice), and assigns a full time slice to the higher-priority thread. For more information, see Context Switches.Last edited by duby229; 22 February 2017, 12:58 AM.
Comment
-
I think, currently, the best comparative benchmark done is this one: https://www.reddit.com/r/linux_gamin...s_and_windows/
It compares Doom on Windows and Linux using an Nvidia driver in both OpenGL and Vulkan modes (no DX at all). As Wine does a direct passthrough for OpenGL and Vulkan, the performance impact is pretty much nonexistant. And the result is:- "native" Vulkan performance difference between Windows and Linux is pretty much nil.
- "optimized" OpenGL and "basic" Vulkan can perform similarly, provided the developer knows OpenGL inside and out (id does) and the driver author profiles his driver for the application like crazy...
- "native" OpenGL sucks: ... when the driver isn't profiled for the app (I don't think Nvidia included the Windows profile for Doom in their Linux driver), performance gets a 20-25% hit.
If you add a wrapper that simply translates DX11 calls to OpenGL ones without looking any further, you can add an extra 10% performance hit.
As such, Feral ports getting ratios close to 0.8-0.9 are pretty damn good; any wrapper that goes past a 0.7 ratio is not doing an half-assed job.
Note: I mention "basic" Vulkan because Nvidia hardware cannot make use of Vulkan-specific features like async compute. On AMD hardware with async compute enabled, the OpenGL results would have been skewed.Last edited by mitch074; 22 February 2017, 04:39 AM.
- Likes 5
Comment
-
Originally posted by phoronix View PostPhoronix: NVIDIA/Radeon Windows 10 vs. Ubuntu Linux Relative Gaming Performance
Last week I published some Windows 10 vs. Ubuntu Linux Radeon benchmarks and Windows vs. Linux NVIDIA Pascal tests. Those results were published by themselves while for this article are the AMD and NVIDIA numbers merged together and normalized to get a look at the relative Windows vs. Linux gaming performance.
http://www.phoronix.com/vr.php?view=24166
And will you add Hitman-benchmarks, too?
Great article! Will send you a tip
- Likes 1
Comment
-
Originally posted by duby229 View Post
All I'm trying to say is that if you take a software load that issues 4 threads and ran it on a 4C/8T processor you will get better performance on Linux than you will on Windows even today. This is where it gets fucked up, on SMT architectures threads 5-8 are not real processors and Windows even still today doesn't give a single little shit about that. It's provable right now by simply benchmarking 4 threads on a 4C/4T SMT disabled configuration and a 4C/8T SMT enabled configuration.
@pal666:
Apologists like you is why Linux continues to lag behind in areas like this: Rather then admit there's a problem that needs to be addressed, you push the blame on everyone else, be it the people who wrote the original application, the people who ported it to Linux, the driver software, the physical GPUs, or even the user. The fact is: Linux has a problem. Stop trying to ignore it, and fix it already.
My argument is provable: Just compare application thread runtimes between Windows and Linux.
Comment
-
Originally posted by gamerk2 View Post
I disagree. On Windows, there is the possibility where a Kernel thread gets a physical core and forces another thread onto a logical core, but this should only occur for a very limited amount of time. Negative SMT effects are downright rare these days.
The key you missed: "The system treats all threads with the same priority as equal." With 32 levels of priority, plus Windows constantly modifying thread priorities behind the scenes, you shouldn't have too many instances where two threads that are both ready to run collide in this fashion, especially within a single application. When it does, then Windows defaults to round-robin (no other real way to tie-break), until the priorities get changed again and Windows goes back to "the highest priority threads run".
@pal666:
Apologists like you is why Linux continues to lag behind in areas like this: Rather then admit there's a problem that needs to be addressed, you push the blame on everyone else, be it the people who wrote the original application, the people who ported it to Linux, the driver software, the physical GPUs, or even the user. The fact is: Linux has a problem. Stop trying to ignore it, and fix it already.
My argument is provable: Just compare application thread runtimes between Windows and Linux.Last edited by duby229; 23 February 2017, 10:26 AM.
Comment
-
-
Originally posted by gamerk2 View PostApologists like you is why Linux continues to lag behind in areas like this:
Originally posted by gamerk2 View PostRather then admit there's a problem that needs to be addressed, you push the blame on everyone else,
Originally posted by gamerk2 View PostThe fact is: Linux has a problem.
Originally posted by gamerk2 View PostMy argument is provable: Just compare application thread runtimes between Windows and Linux.
and no, you can't make conclusions based on thread runtimes because thread can be not ready to run. to benchmark scheduler you need to measure time spent in runnable state without actual running and time spent on migration. did you do that?Last edited by pal666; 24 February 2017, 11:05 AM.
Comment
-
Originally posted by pal666 View Postand no, you can't make conclusions based on thread runtimes because thread can be not ready to run. to benchmark scheduler you need to measure time spent in runnable state without actual running and time spent on migration. did you do that?
EDIT:
Based on the info I've seen, I think that a [ DX11->OpenGL->GPU-commands ] process is bound to produce less efficient GPU-commands than a [ DX11->GPU-commands ] process or a [ native-OpenGL->GPU-commands ] process.
A [ DX11->Vulkan->GPU-commands ] process should do much better, if the DX11->Vulkan conversion is done with enough effort to always call the most effective Vulkan functions, and the Vulkan API is actually close enough to the GPU hardware to allow generating the most effective GPU-commands.Last edited by indepe; 25 February 2017, 03:55 AM.
- Likes 1
Comment
Comment