Announcement

Collapse
No announcement yet.

AMD R600g Performance Patches Yield Mixed Results

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD R600g Performance Patches Yield Mixed Results

    Phoronix: AMD R600g Performance Patches Yield Mixed Results

    Following performance benchmark results I published earlier this week comparing the open-source Radeon and AMD Catalyst driver performance under Ubuntu 12.10, Marek, the well-known independent open-source graphics driver developer, set out to explore some of the performance issues in the open-source driver. One day later, he published a patch that could quadruple the frame-rate of the Radeon Gallium3D driver. He went on to push another performance-focused patch too for this R600g driver. In this article are a fresh round of benchmarks of the open-source driver to look at the wins and losses attributed to this new code.

    http://www.phoronix.com/vr.php?view=18093

  • #2
    That's not what I'd call mixed results. It looks like an improvement across the board with one regression.

    Comment


    • #3
      Wonderful article title again Michael

      How the heck is this mixed results?!

      It seems to me that the _only_ game where it didn't pay off is Xonotic and looking at the frame rates there I'm guessing that either that game does something weird of something else is going on (like hitting a software fallback path somewhere, for example).

      Michael, you need an attitude change.
      How about being positive for a change instead of outrageously negative...

      Comment


      • #4
        Only game that had an issue was Xonotic in High. That seems more like a bug in the game then a regression.

        Comment


        • #5
          How can we help?

          What can we do to accelerate getting the continuous integration testing done? What do you need help with?

          Comment


          • #6
            It wasn't just Xonotic. From the article: https://bugs.freedesktop.org/show_bug.cgi?id=56634.
            Marek also said, changing a heuristic is a mixed bag and can cause regressions. Hopefully, they'll be mostly ironed out soon ..

            Comment


            • #7
              Originally posted by Rigaldo View Post
              It wasn't just Xonotic. From the article: https://bugs.freedesktop.org/show_bug.cgi?id=56634.
              Marek also said, changing a heuristic is a mixed bag and can cause regressions. Hopefully, they'll be mostly ironed out soon ..
              Right, exactly, it seems to be bad for the more demanding cases... Xonotic high, ETQW, Unigine... Most of the tests in the article are just of ioquake3.
              Michael Larabel
              http://www.michaellarabel.com/

              Comment


              • #8
                The closed blob's perfirmance advantage is not due to tweaks like this. I have an extremely simple test case which renders a single static rectangular texture and the open source driver is half the speed. This is the simplest fundamental operation and alas we are slower than we should be. Until fundamental problems like that are addressed, tweaks here and there for this and that game are not likely to have the expected result.

                Comment


                • #9
                  Based on the way only some games are being affected, it looks like something very specific is broken. Especially since according to that bug report, ETQW's performance is usually better, and only sometimes worse. My intuition is that this should be possible to fix without reverting the optimization, although I'm not familiar with the r600g code.

                  If it turns out that the regressions can't be fixed without reverting, well, the blobs have game-specific hacks, why shouldn't r600g?

                  Comment


                  • #10
                    Can we have Nouveau tests of gpus without reclocking support vs Nvidia blobs **reclocked** to frequencies used by Nouveau?

                    Now we know how good Nvidia is and how good Nouveau is without actually using 100% of GPU computational powers. If we could see how Nvidia behave on smaller frequencies we could compare relative capabilities of those gpu drivers with better accuracy (and "somehow" scale up Nouveau perf in "would be" scenario where Nouv know how to reclock)


                    As for article. Is Marek around to comment?

                    PS While crowdfunding messa/drivers is not practical now, maybe crowdfunding x.org efforts for continous integration could be doable? Michael what do you think? Maybe you can talk about it with x.org foundation?

                    Comment


                    • #11
                      Even with Xonotic results , it's clear that he is on the right track, he "only" needs to figure out what is happening with Xonotic to fix it....

                      I bet that solving the problem with Xonotic in the driver itself will also solve the problem with several games that are affected by this patch....

                      Comment


                      • #12
                        Originally posted by tmikov View Post
                        The closed blob's perfirmance advantage is not due to tweaks like this. I have an extremely simple test case which renders a single static rectangular texture and the open source driver is half the speed. This is the simplest fundamental operation and alas we are slower than we should be. Until fundamental problems like that are addressed, tweaks here and there for this and that game are not likely to have the expected result.
                        Make sure you have 2D tiling enabled otherwise you won't be fully utilizing your memory bandwidth; it's been made the default as of mesa 9.0 and xf86-video-ati git master. Note that the EGL paths to not properly handle tiling yet.

                        Comment


                        • #13
                          Originally posted by agd5f View Post
                          Make sure you have 2D tiling enabled otherwise you won't be fully utilizing your memory bandwidth; it's been made the default as of mesa 9.0 and xf86-video-ati git master. Note that the EGL paths to not properly handle tiling yet.
                          Tiling is enabled, though I don't see a difference when I enable 2D tiling vs 1D.

                          About EGL: I have applied a simple patch which enables tiling of the frame buffer. With that It matches the performance of running under X11. In both cases I get 130 FPS, while the blob is at about 220. (It is not actually double, sorry, but is significantly faster).

                          Comment


                          • #14
                            Well, guessing from what was changed I'd say that the huge difference is when the game is complex enough to run out of memory from the graphics card. Since they changed "VRAM|GTT" to just "VRAM", it seems quite likely that is the issue, there should be some code to detect when the video card get close to running out of VRAM and switch from "VRAM" back to "VRAM|GTT" relocations for those workloads while keeping VRAM only for workloads which require less video memory. Another solution would be to somehow monitor which of those resources are accessed more and which less often and locate the high access count resources in VRAM and the rest in GTT.

                            Comment


                            • #15
                              Originally posted by xception View Post
                              Well, guessing from what was changed I'd say that the huge difference is when the game is complex enough to run out of memory from the graphics card. Since they changed "VRAM|GTT" to just "VRAM", it seems quite likely that is the issue, there should be some code to detect when the video card get close to running out of VRAM and switch from "VRAM" back to "VRAM|GTT" relocations for those workloads while keeping VRAM only for workloads which require less video memory. Another solution would be to somehow monitor which of those resources are accessed more and which less often and locate the high access count resources in VRAM and the rest in GTT.
                              wouldn'T it be better to always keep it VRAM|GTT but distribute between them two more intelligently?

                              Comment

                              Working...
                              X