Announcement

**dungeon** · 21 February 2017, 08:37 PM

Originally posted by philips View Post

There are no differences between Windows and SteamOS, but rather between benchmark runs. Rarely benchmarks are identical in every detail, most of the time are more like scenarios where a certain amount of actions are taking place at random.

I just replied to darkbasic who claim how "The video is just to show if it renders the same or not..." and i know that isn't a case

There are benchmark details there in a video.

**theriddick** · 21 February 2017, 08:54 PM

Again not terrible results, Talos and CIV seem to be the games that are suffering the most, others such as Deus Ex come down to a poorly optimized game for the most part. I think Linux as a gaming platform is becoming a very real reality and perhaps in another 12 months we will get there, it will help when VR becomes available (its in beta now) however I'm still waiting for VRhelmets to mature.

I think the graphics could have been just normalized with the REAL frame-rates instead, changing it to a 1.0 normalization just cuts out 1 bit of information while using FPS as normalization would also give us a perspective of what those platforms achieve.

**duby229** · 22 February 2017, 12:04 AM

Originally posted by gamerk2 View Post

Windows handles this case *better* as of Vista; Windows at least checks the CPUID flags to determine if the processor uses HTT, and if so, tries to put Kernel threads on HTT cores to avoid bumping user-mode threads. Granted, this doesn't always work well, but it's handled a lot better then it was in the Pentium 4 days [where Windows WOULD act in exactly the way you describe].

This is partly why AMD's CMT stank out of the box; Windows didn't see a CPUID flag for HTT, so it treated all the cores in AMDs CPUs equally. As we now know, there's about a 20-25% performance loss as part of CMT, which ate into performance at launch. This was eventually mitigated via a Kernel patch, which essentially treated Bulldozer based CPUs and it's decedents like Hyperthreaded CPUs for the purposes of thread scheduling.

All I'm trying to say is that if you take a software load that issues 4 threads and ran it on a 4C/8T processor you will get better performance on Linux than you will on Windows even today. This is where it gets fucked up, on SMT architectures threads 5-8 are not real processors and Windows even still today doesn't give a single little shit about that. It's provable right now by simply benchmarking 4 threads on a 4C/4T SMT disabled configuration and a 4C/8T SMT enabled configuration.

EDIT: What I'm saying is bencharking 4 threads on 4C/4T vs 4 threads on 4C/8T aka SMT disabled vs SMT enabled on Windows and then again on Linux will show exactly what I'm talking about.

EDIT:It may be possible to get additional performance out of an SMT pipeline in cases where there are available integer units,which can increase overall performance. However that second thread then pulls down the performance of the first thread in cases where the total demand for integer units is larger than the number available. In todays modern age most X86 processors can extract between 2-3 integer operations per cycle, but they only have 4 integer units per pipeline, which means that at -every- cycle there will be between 1-2 instructions lost. If you want the best possible per thread performance then you really do need to make sure the number of threads issued is -less- than the number of -actual- cores your processor has. If you are on Linux that's it, but if you are on Windows you additionally need to make sure SMT is disabled.

EDIT: Also, Windows doesn't exactly "schedule" threads per se. This is what I mean.

Scheduling Priorities - Win32 apps

https://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs.85).aspx

Threads are scheduled to run based on their scheduling priority.

The system treats all threads with the same priority as equal. The system assigns time slices in a round-robin fashion to all threads with the highest priority. If none of these threads are ready to run, the system assigns time slices in a round-robin fashion to all threads with the next highest priority. If a higher-priority thread becomes available to run, the system ceases to execute the lower-priority thread (without allowing it to finish using its time slice), and assigns a full time slice to the higher-priority thread. For more information, see Context Switches.

**mitch074** · 22 February 2017, 04:36 AM

I think, currently, the best comparative benchmark done is this one: https://www.reddit.com/r/linux_gamin...s_and_windows/
It compares Doom on Windows and Linux using an Nvidia driver in both OpenGL and Vulkan modes (no DX at all). As Wine does a direct passthrough for OpenGL and Vulkan, the performance impact is pretty much nonexistant. And the result is:

"native" Vulkan performance difference between Windows and Linux is pretty much nil.
"optimized" OpenGL and "basic" Vulkan can perform similarly, provided the developer knows OpenGL inside and out (id does) and the driver author profiles his driver for the application like crazy...
"native" OpenGL sucks: ... when the driver isn't profiled for the app (I don't think Nvidia included the Windows profile for Doom in their Linux driver), performance gets a 20-25% hit.

If you add a wrapper that simply translates DX11 calls to OpenGL ones without looking any further, you can add an extra 10% performance hit.

As such, Feral ports getting ratios close to 0.8-0.9 are pretty damn good; any wrapper that goes past a 0.7 ratio is not doing an half-assed job.

Note: I mention "basic" Vulkan because Nvidia hardware cannot make use of Vulkan-specific features like async compute. On AMD hardware with async compute enabled, the OpenGL results would have been skewed.

**Asariati** · 22 February 2017, 08:23 AM

Originally posted by phoronix View Post

Phoronix: NVIDIA/Radeon Windows 10 vs. Ubuntu Linux Relative Gaming Performance

Last week I published some Windows 10 vs. Ubuntu Linux Radeon benchmarks and Windows vs. Linux NVIDIA Pascal tests. Those results were published by themselves while for this article are the AMD and NVIDIA numbers merged together and normalized to get a look at the relative Windows vs. Linux gaming performance.

http://www.phoronix.com/vr.php?view=24166

Michael, could you add a toggle or export or extra graphs which also show the absolute frame numbers instead of just relative performance?
And will you add Hitman-benchmarks, too?

Great article! Will send you a tip

**gamerk2** · 23 February 2017, 09:41 AM

Originally posted by duby229 View Post

All I'm trying to say is that if you take a software load that issues 4 threads and ran it on a 4C/8T processor you will get better performance on Linux than you will on Windows even today. This is where it gets fucked up, on SMT architectures threads 5-8 are not real processors and Windows even still today doesn't give a single little shit about that. It's provable right now by simply benchmarking 4 threads on a 4C/4T SMT disabled configuration and a 4C/8T SMT enabled configuration.

I disagree. On Windows, there is the possibility where a Kernel thread gets a physical core and forces another thread onto a logical core, but this should only occur for a very limited amount of time. Negative SMT effects are downright rare these days.

EDIT: Also, Windows doesn't exactly "schedule" threads per se. This is what I mean.

Scheduling Priorities - Win32 apps

https://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs.85).aspx

Threads are scheduled to run based on their scheduling priority.

The key you missed: "The system treats all threads with the same priority as equal." With 32 levels of priority, plus Windows constantly modifying thread priorities behind the scenes, you shouldn't have too many instances where two threads that are both ready to run collide in this fashion, especially within a single application. When it does, then Windows defaults to round-robin (no other real way to tie-break), until the priorities get changed again and Windows goes back to "the highest priority threads run".

@pal666:

Apologists like you is why Linux continues to lag behind in areas like this: Rather then admit there's a problem that needs to be addressed, you push the blame on everyone else, be it the people who wrote the original application, the people who ported it to Linux, the driver software, the physical GPUs, or even the user. The fact is: Linux has a problem. Stop trying to ignore it, and fix it already.

My argument is provable: Just compare application thread runtimes between Windows and Linux.

**duby229** · 23 February 2017, 10:08 AM

Originally posted by gamerk2 View Post

I disagree. On Windows, there is the possibility where a Kernel thread gets a physical core and forces another thread onto a logical core, but this should only occur for a very limited amount of time. Negative SMT effects are downright rare these days.

It's not rare, it happens literally every single time every single thread runs. If you run 2 threads there is a 25% chance one thread will be on a logical core and at 4 threads it's a 50% chance at least 1 or even 2 of those threads will be on a logical core and at 8 threads it's a 100% chance that 4 of those threads will be on a logical core. And anytime -any- thread runs on a logical core that thread will be running on a pipeline that could potentially be maxed out, which results in 1 or 2 instructions per cycle lost on both threads. Every time.

The key you missed: "The system treats all threads with the same priority as equal." With 32 levels of priority, plus Windows constantly modifying thread priorities behind the scenes, you shouldn't have too many instances where two threads that are both ready to run collide in this fashion, especially within a single application. When it does, then Windows defaults to round-robin (no other real way to tie-break), until the priorities get changed again and Windows goes back to "the highest priority threads run".

Which I'm saying isn't exactly scheduling. Let's face the facts here and admit that while Linux may have a bit higher overhead. it actually does this the correct way.

@pal666:

Apologists like you is why Linux continues to lag behind in areas like this: Rather then admit there's a problem that needs to be addressed, you push the blame on everyone else, be it the people who wrote the original application, the people who ported it to Linux, the driver software, the physical GPUs, or even the user. The fact is: Linux has a problem. Stop trying to ignore it, and fix it already.

My argument is provable: Just compare application thread runtimes between Windows and Linux.

On single threaded loads it'll be identical and on multithreaded loads Linux scales better depending on the number of physical processors where adding more processors increases it's lead in scalability. It's scalability lead is even more pronounced on SMT architectures due to the aforementioned failure of MS do things the right way.

**pal666** · 24 February 2017, 10:30 AM

Originally posted by duby229 View Post

EDIT:It may be possible to get additional performance out of an SMT pipeline in cases where there are available integer units

units sharing has nothing to do with smt. unit sharing is bulldozer thing, smt is masking cache misses.

**pal666** · 24 February 2017, 10:37 AM

Originally posted by gamerk2 View Post

Apologists like you is why Linux continues to lag behind in areas like this:

i am putting the blame on idiots like you

Originally posted by gamerk2 View Post

Rather then admit there's a problem that needs to be addressed, you push the blame on everyone else,

yes, that is what you are doing. while i told you exactly where problem is: non enough manpower to redesign directx ports

Originally posted by gamerk2 View Post

The fact is: Linux has a problem.

the fact is: you are idiot. linux is the most used operating system(android is linux)

Originally posted by gamerk2 View Post

My argument is provable: Just compare application thread runtimes between Windows and Linux.

your argument is broken: you are comparing different applications. you can't derive anything useful from such comparison. to compare same application you need windows opengl app, or at least app with full-fledged opengl port and then you will have surprise http://blogs.valvesoftware.com/linux/faster-zombies/

and no, you can't make conclusions based on thread runtimes because thread can be not ready to run. to benchmark scheduler you need to measure time spent in runnable state without actual running and time spent on migration. did you do that?

**indepe** · 25 February 2017, 03:36 AM

Originally posted by pal666 View Post

and no, you can't make conclusions based on thread runtimes because thread can be not ready to run. to benchmark scheduler you need to measure time spent in runnable state without actual running and time spent on migration. did you do that?

Agree, and I don't think gamerk2 did. However, as mentioned in another thread, I did the first measurement (albeit in a context that might not be typical), and there wasn't anything even close to explain the differences between Windows DX11/DX12 ports on Linux, and the original. Maybe Feral itself would be able to run the OpenGL conversion on Windows instead of Linux. I'd expect they would see, more or less, the same loss of performance on Windows.

EDIT:
Based on the info I've seen, I think that a [ DX11->OpenGL->GPU-commands ] process is bound to produce less efficient GPU-commands than a [ DX11->GPU-commands ] process or a [ native-OpenGL->GPU-commands ] process.

A [ DX11->Vulkan->GPU-commands ] process should do much better, if the DX11->Vulkan conversion is done with enough effort to always call the most effective Vulkan functions, and the Vulkan API is actually close enough to the GPU hardware to allow generating the most effective GPU-commands.

Announcement

NVIDIA/Radeon Windows 10 vs. Ubuntu Linux Relative Gaming Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment