Announcement

Collapse
No announcement yet.

AMD Fusion On Gallium3D Leaves A Lot To Be Desired

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by log0 View Post
    I see. What I am wondering about is why do graphics card drivers require this amount of manpower? Is it the hardware or the API? How is it possible to write driver code that could be more than ten times faster? Or is it that the hardware doesn't map to the exposed API(OpenGL) and requires complex translation? Sorry if the questions sound silly, I've never worked with bare hardware.
    Yes, the translation is really complex as far as OpenGL is concerned. Implementing a performant shader compiler is also not easy. Then, there are hardware optimizations which you can use, like texture tiling, hierarchical Z-Stencil buffers, colorbuffer compression, etc.

    We need a driver which:
    1) doesn't starve the GPU by doing too much CPU work
    2) doesn't synchronize the CPU with the GPU, so that the two can operate asynchronously
    3) takes advantage of every hardware feature which improves performance

    FYI, I was told by some NVIDIA guy face-to-face a few years ago that their Vista GPU drivers had 20 million lines of code. The entire Linux kernel has only 14.3M.

    Comment


    • #17
      FWIW, we have hundreds of developers working on the closed source AMD driver and the closed driver was ~40 million LOC last time I checked which was a while back.

      Comment


      • #18
        Originally posted by pingufunkybeat View Post
        N.B. I'm not a driver developer, just an interested observer

        In this case, the open source driver was forced into a low-power mode, while the proprietary driver was going on at full blast. Also, it's possible that not all functionality (such as tiling) was enabled on the open source driver. When there's an order-of-magnitude difference, then either something is wrong, or the driver is too new and there's lots of work needed still.

        The problem with OpenGL drivers (and GPU drivers in general) is that they are amazingly complex hardware that takes incredible amounts of code (especially full OpenGL support). It's much more complex than a network card driver or a mouse driver. With most chips, the Gallium3d drivers for radeons are around 60-70% of the proprietary driver, which is as close as you can get with "regular" effort.

        Then the things get complicated. A GPU driver runs on the CPU and often has to do many things before it can prepare a frame for rendering. If it is not optimised, then the time adds up, lots of little delays all over the stack, which need to be optimised one-by-one, hundreds of them. This is very time-intensive and takes a lot of manpower. If you are running something at 100 frames per second ,then this quickly adds up and makes a huge difference. Even a small delay multiplied by 100 becomes a long wait. That's why the developers are first focusing on getting a driver working correctly, and only then try to optimise it.

        With some work, and Tom's VLIW packetiser and the new shader compiler, and the Hyper-Z support, things should come to more than 80% of the proprietary performance, perhaps even more (rough guess). That's really good, and the additional work after than becomes too complex, with very little gain.
        I don't understand why people believe shader optimization is a big issue. On all the benchmark in this article a better shader would most likely wouldn't make a mesurable differences. Marek has far better point to explain the gap.Oh and if you want to convince yourself that shader is not a big issue, take a big shader of doom3, do a sample gl program that use that shader to draw quad covering biggest fbo possible on your generation, draw thousand of time, then hand optimize the shader and hack r600g to use your hand optimized version. Compare, last time i did such things the difference wasn't that big.

        Comment


        • #19
          glisse it's because nouveau is faster than radeon: considering nouveau isn't backed by nvidia and there isn't any documentation that's quite strange and peoples started searching for a culprit.
          ## VGA ##
          AMD: X1950XTX, HD3870, HD5870
          Intel: GMA45, HD3000 (Core i5 2500K)

          Comment


          • #20
            @glisse

            Do you mean TGSI or r600 asm?

            My GSOC shader (TGSI) was 20% faster when hand-optimized compared to Mesa's GLSL compiler. But that's only at TGSI level, I believe it would be much faster if properly compiled (maybe wrong word) down to r600g asm instead of the simple replacement that I understand is the current status.

            Comment


            • #21
              The 2x difference doesn't surprise me, although some of the 10x differences do since IIRC the corresponding number on discrete GPUs is more like 3-5x than 10x. THe difference may be memory bandwidth sensitivity (optimizing to reduce bandwidth consumption is one of the more complex aspects of driver optimization) or just that the default clock state for APUs is even lower than I remembered.

              As others have said it would be good to keep in mind which performance features in the HW are enabled and which are still being worked on. Looks like the article ran with driver defaults - Michael is that correct ? Any idea what clocks were being used (ie what the VBIOS power state called for) ?

              Testing with defaults (ie ignoring some WIP improvements which are still off by deafult) seems reasonable to me as long as you aren't drawing "it's gonna be years" conclusions from the results.

              Comment


              • #22
                Originally posted by curaga View Post
                @glisse

                Do you mean TGSI or r600 asm?

                My GSOC shader (TGSI) was 20% faster when hand-optimized compared to Mesa's GLSL compiler. But that's only at TGSI level, I believe it would be much faster if properly compiled (maybe wrong word) down to r600g asm instead of the simple replacement that I understand is the current status.
                I mean at r600 asm level. What was your shader ? My experiment showed allmost no win but i didn't do big shader (even doom3 don't have that big shader).

                Comment


                • #23
                  Originally posted by Drago View Post
                  Don't be so sure. Tom Stellar is integrating LLVM backend for r600g as we speak, and once it is done, and LLVM->VLIW packetizer is finished(it is started), we can all enjoy faster shaders both graphics and compute. 3-4 years is awfully pesimistic.
                  if 3-4 years are pessimistic then 1-2 years are optimistic

                  well yes right in the moment they finish this i buy a new graphic card with GCN architecture just to make sure the driver is alpha/beta again. LOL

                  Comment


                  • #24
                    Originally posted by darkbasic View Post
                    glisse it's because nouveau is faster than radeon: considering nouveau isn't backed by nvidia and there isn't any documentation that's quite strange and peoples started searching for a culprit.
                    Where is it shown the nouveau would be faster? I have had the impression that, when they both work, they are about the same.

                    Comment


                    • #25
                      Originally posted by bridgman View Post
                      As others have said it would be good to keep in mind which performance features in the HW are enabled and which are still being worked on. Looks like the article ran with driver defaults - Michael is that correct ? Any idea what clocks were being used (ie what the VBIOS power state called for) ?
                      Bridgman, AFAIK checking the Radeon clocks still requires a debug feature (DEBUGFS), even though it is a decidedly non-debug operation. Could you lobby on making a stable interface for it in /sys, so all users will have access to the clocks without requiring the enabling of debug features?

                      What was your shader ? My experiment showed allmost no win but i didn't do big shader (even doom3 don't have that big shader).
                      MLAA. The hand-optimized TGSI can be found in any current Mesa. And yes, it's a big shader (three passes, the second pass is the biggest).

                      Comment


                      • #26
                        Originally posted by curaga View Post
                        Bridgman, AFAIK checking the Radeon clocks still requires a debug feature (DEBUGFS), even though it is a decidedly non-debug operation. Could you lobby on making a stable interface for it in /sys, so all users will have access to the clocks without requiring the enabling of debug features?
                        +1

                        (stupid character limit)
                        ## VGA ##
                        AMD: X1950XTX, HD3870, HD5870
                        Intel: GMA45, HD3000 (Core i5 2500K)

                        Comment


                        • #27
                          Originally posted by darkbasic View Post
                          +1
                          Even better, if the interface is GPU vendor neutral. This way the user-space app would be simple, and will GPU neutral also.

                          Comment


                          • #28
                            So we all agree that the very poor benchmarks were due to the clocks being low by default on Fusion?

                            Then what's left to be desired is proper 'idiot proof' dynamic power management, enabled by default, so phoronix won't draw the wrong conclusions after benchmarking

                            Comment

                            Working...
                            X