Announcement

Collapse
No announcement yet.

NVIDIA vs. AMD Linux Gaming Performance For End Of May 2021 Drivers

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cynical
    replied
    Originally posted by smitty3268
    I'm fairly certain the 5900X has cores split 6 and 6 between the CCX's, so that both are equal to each other.
    Oh yes you are correct about this.

    Originally posted by Linuxxx
    I only brought up the RPCS3 example because you claimed that inter-CCX latency can't be that bad, while you now stand corrected by acknowledging yourself that the 5900X can have a worsening effect on gaming results.
    Well it’s not even the latency that is really the problem in gaming, although it might add something. It’s that the 5800X can boost higher than the 5900X on all cores because fewer cores means less heat when you do it, so you can sustain higher boost clocks for longer.

    I’m not sure why the 6800 result is off, but could it be a regression in the driver code?

    Leave a comment:


  • smitty3268
    replied
    Originally posted by Linuxxx View Post

    Still doesn't answer the question why the 6800 performed the way it did.

    If the "amdgpu" kernel module keeps bouncing around between the two CCXs of the 5900X because of the unnecessary interrupt distribution introduced by the "irqbalance" daemon, do you really think that this won't have any impact on the frame-time & FPS numbers?
    It wouldn't surprise me if there is a bug with the 6800 driver code. It certainly wouldn't be the first time. That said, sure, I don't know why it's slow. Maybe it's the irqbalance thing you mentioned.

    I was only responding that in general, Zen 2 had known performance issues with games when going past 8 cores, and that has largely been mitigated with Zen 3 designs.

    Leave a comment:


  • Linuxxx
    replied
    Originally posted by smitty3268 View Post

    I'm fairly certain the 5900X has cores split 6 and 6 between the CCX's, so that both are equal to each other.

    Agree with the rest of what you said though. Most of the latency issues Zen 2 had are solved or at least mitigated with Zen 3. That's one of the ways they improved gaming performance by so much this generation. And if RPCS3 was ok with Zen 2 8 core CPUs they are certainly going to be fine with Zen3 up to 16 cores due to the doubled size of each CCX.
    Still doesn't answer the question why the 6800 performed the way it did.

    If the "amdgpu" kernel module keeps bouncing around between the two CCXs of the 5900X because of the unnecessary interrupt distribution introduced by the "irqbalance" daemon, do you really think that this won't have any impact on the frame-time & FPS numbers?

    Leave a comment:


  • Linuxxx
    replied
    Originally posted by cynical View Post
    If you take a look at any benchmark comparing the 5900X to the 5800X you would see that while in gaming the 5800X is on par or better due to having everything on a single CCX, in any scenario where multiple cores is valuable (compilation, rendering, etc), the 5900X is significantly better at the task.
    And what was the on-topic confusion all about? (Hint: 6800)

    I only brought up the RPCS3 example because you claimed that inter-CCX latency can't be that bad, while you now stand corrected by acknowledging yourself that the 5900X can have a worsening effect on gaming results.

    And in my original post I named what I believed to be the culprit for those unreliable benchmarks; namely, "irqbalance".

    Anyhow, have any better idea for the odd output by the AMD Radeon 6800?

    Leave a comment:


  • smitty3268
    replied
    Originally posted by cynical View Post
    So on a 5900X, you are dealing with a configuration of eight cores on one CCX and four additional cores on the second CCX. That means you can have 16 threads on a single CCX, and from the site you are talking about...
    I'm fairly certain the 5900X has cores split 6 and 6 between the CCX's, so that both are equal to each other.

    Agree with the rest of what you said though. Most of the latency issues Zen 2 had are solved or at least mitigated with Zen 3. That's one of the ways they improved gaming performance by so much this generation. And if RPCS3 was ok with Zen 2 8 core CPUs they are certainly going to be fine with Zen3 up to 16 cores due to the doubled size of each CCX.
    Last edited by smitty3268; 31 May 2021, 06:28 PM.

    Leave a comment:


  • cynical
    replied
    Originally posted by Linuxxx View Post

    Nerf this:

    One of the first questions one may ask after seeing the graph is how a 3800X is performing better than a 3950X even though it has twice the cores and cache? The answer to that is due to increased latency from the 3950X’s multi-chiplet design. While the 3800X only has to communicate across two 4-core CCXes, the 3950X takes it a step further, and has two chiplets each with two 4-core CCXes it has to communicate across.
    I thought we were talking about Zen 3? Sure the above is true, because you are communicating across four different CCXs, and frequently if you are taking advantage of the thread count. Zen 3 only has two CCXs. From Anandtech:

    (talking about the 3950x) Nevertheless, in the result we can clearly see the low-latencies of the four CCXs, with inter-core latencies between CPUs of differing CCXs suffering to a greater degree in the 82ns range, which remains one of the key disadvantages of AMD’s core complex and chiplet architecture.

    On the new Zen3-based Ryzen 9 5950X, what immediately is obvious is that instead of four low-latency CPU clusters, there are now only two of them. This corresponds to AMD’s switch from four CCX’s for their 16-core predecessor, to only two such units on the new part, with the new CCX basically being the whole CCD this time around.
    So on a 5900X, you are dealing with a configuration of eight cores on one CCX and four additional cores on the second CCX. That means you can have 16 threads on a single CCX, and from the site you are talking about...

    The first thing that you should consider is that RPCS3 can heavily utilize up to 16 CPU threads, and once you go past that it’s very likely that you won’t see improvements. What this means is that once you have a CPU with 16 threads, you should invest in a faster single core performance instead. Keep in mind that you definitely won’t need 16 threads for all the titles, in RDR and a few other titles for instance won’t care if you go from 8C/8T to 8C/16T.
    So you won't benefit from more threads anyway. You could not even encounter a latency issue unless the Linux kernel decided to split the workload between CCXs for some reason. And this is all talking about a very specific use case: this emulation software. Even in your quote it says "unlike other software", because most of them do not have this same requirement of constant communication between threads.

    I think it's nuts to take this small use case on an older generation of Zen, and conclude that Zen 3 sucks and isn't worth buying lol. If you take a look at any benchmark comparing the 5900X to the 5800X you would see that while in gaming the 5800X is on par or better due to having everything on a single CCX, in any scenario where multiple cores is valuable (compilation, rendering, etc), the 5900X is significantly better at the task.

    Leave a comment:


  • Linuxxx
    replied
    Originally posted by cynical View Post

    Does CCX latency really matter? Either you aren’t using that many cores, in which case you never run into the latency issue, or you are using all 12, in which case the performance of the extra cores is going to outweigh the latency involved in communicating with them.

    In an ideal world, sure, you wouldn’t want the extra latency, but it’s a small price to pay if you actually need the cores.
    Nerf this:

    One of the first questions one may ask after seeing the graph is how a 3800X is performing better than a 3950X even though it has twice the cores and cache? The answer to that is due to increased latency from the 3950X’s multi-chiplet design. While the 3800X only has to communicate across two 4-core CCXes, the 3950X takes it a step further, and has two chiplets each with two 4-core CCXes it has to communicate across.

    Unlike other software, RPCS3’s PPU & SPU threads need to communicate constantly which results in a major bottleneck if these threads are split across multiple CCXes / chiplets. That ends up with the CPU hitting this bottleneck constantly with all the data moving around. This is why we do not recommend Ryzen CPUs unless they have a 3 or 4 core CCX design (6-8 core Ryzen CPUs, or a 4 core Ryzen APU). A 4 core CCX design is ideal as RPCS3 can fit all the PPU & SPU threads onto a single CCX, allowing users to bypass inter-CCX latency bottleneck entirely, provided the PPU & SPU threads are being scheduled properly to be placed on a single CCX.

    Leave a comment:


  • aufkrawall
    replied
    Originally posted by CochainComplex View Post
    p.s.: it is helpfull to clean the shadercache - I have heard it is not necessary but I fear sometimes it effects the outcome if not cleared.
    Mesa shader cache should be quite trustworthy, I don't think you'd ever need to clear it manually (unless you want to desperately free a few hundred megabytes of disk space).

    Leave a comment:


  • CochainComplex
    replied
    Originally posted by kokoko3k View Post
    could you be a little more specific?
    what hardware? what mesa version? what benchmark?
    any references?

    thanks!
    Hardware: Ryzen3600XT+5700XT
    Popos20.10 - xanmodkernel-cacule 5.12.8 build with march=native and O3
    Mesa and drm latest git pulls
    Build with clear linux spec flags** and -march=native -flto* or -march=znver2 -mtune=znver2 -flto same for libdrm
    https://github.com/clearlinux-pkgs/m...ster/mesa.spec
    ...only compilerflags not the config flags.
    Compiler gcc.11.1.1 + binutils 2.36.1 pulled via ppa:netext/netext73

    Benchmarks internal AC Odyssey (lutris), FC New Dawn (lutris) and DX MD (steam Dx11).

    But be aware it can break very easily and sometimes regressions occure. E.g. AC:O at the moment I have [email protected] (custom settings close to ultra) after building the latest mesa+drm...before stock popos mesa 65-68fps...but it does not reboot into popshell again - I have installed some new libs as well so still bisecting the issue. But my approach worked almost a year. Maybe regressions but no "gamestopper" like now.
    Well I did a clean reinstall after upgrading to pop os 21.04 beta which was a dissapointment performance wise before and usually I have oibaff ppa installed too to get latest dependecy libs But well I upgraded it on a modified system so that could have effected the worse performance on popos21.04...a lot off possible causes...I need time to figure it out.
    However this will be highly depending on your hardware so tinkering is the way you have to go.

    *-flto or -ffat-lto-objects what ever works better
    ** btw "-O3 -falign-functions=32 -fno-math-errno -fno-semantic-interposition -fno-trapping-math" is a quite good flag string which is extensively used by almost all clear linux pkgs and works quite well for a lot of situations. I start with this one and then I try if march=native or march=znver2 mtune=znver2 breaks it ... if it does I will simply add mtune=znver2 but I'm crosschecking if it hurts performance.

    p.s.: it is helpfull to clean the shadercache - I have heard it is not necessary but I fear sometimes it effects the outcome if not cleared.
    Last edited by CochainComplex; 31 May 2021, 03:35 AM.

    Leave a comment:


  • smitty3268
    replied
    RT support for radv is coming along. Apparently this partial support is enough for many demos: https://gitlab.freedesktop.org/mesa/...requests/11078

    Leave a comment:

Working...
X