Announcement

Collapse
No announcement yet.

AMD Fusion On Gallium3D Leaves A Lot To Be Desired

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Open source driver seems roughly the same level of performance
    as open source driver for Intel's SB graphics. I guess one should
    really use Catalyst. If your card is supported, though.

    Comment


    • #12
      Originally posted by log0 View Post
      I see. What I am wondering about is why do graphics card drivers require this amount of manpower? Is it the hardware or the API? How is it possible to write driver code that could be more than ten times faster? Or is it that the hardware doesn't map to the exposed API(OpenGL) and requires complex translation? Sorry if the questions sound silly, I've never worked with bare hardware.
      N.B. I'm not a driver developer, just an interested observer

      In this case, the open source driver was forced into a low-power mode, while the proprietary driver was going on at full blast. Also, it's possible that not all functionality (such as tiling) was enabled on the open source driver. When there's an order-of-magnitude difference, then either something is wrong, or the driver is too new and there's lots of work needed still.

      The problem with OpenGL drivers (and GPU drivers in general) is that they are amazingly complex hardware that takes incredible amounts of code (especially full OpenGL support). It's much more complex than a network card driver or a mouse driver. With most chips, the Gallium3d drivers for radeons are around 60-70% of the proprietary driver, which is as close as you can get with "regular" effort.

      Then the things get complicated. A GPU driver runs on the CPU and often has to do many things before it can prepare a frame for rendering. If it is not optimised, then the time adds up, lots of little delays all over the stack, which need to be optimised one-by-one, hundreds of them. This is very time-intensive and takes a lot of manpower. If you are running something at 100 frames per second ,then this quickly adds up and makes a huge difference. Even a small delay multiplied by 100 becomes a long wait. That's why the developers are first focusing on getting a driver working correctly, and only then try to optimise it.

      With some work, and Tom's VLIW packetiser and the new shader compiler, and the Hyper-Z support, things should come to more than 80% of the proprietary performance, perhaps even more (rough guess). That's really good, and the additional work after than becomes too complex, with very little gain.

      Comment


      • #13
        Originally posted by log0 View Post
        I see. What I am wondering about is why do graphics card drivers require this amount of manpower?
        I think AMD & Nvidia easily have > 100 people programming on their (closed source) drivers, so "5 extra developers" isn't a big amount of manpower...

        Comment


        • #14
          Originally posted by pingufunkybeat View Post
          In this case, the open source driver was forced into a low-power mode, while the proprietary driver was going on at full blast. Also, it's possible that not all functionality (such as tiling) was enabled on the open source driver. When there's an order-of-magnitude difference, then either something is wrong, or the driver is too new and there's lots of work needed still.
          From my A6-3500 series (via ssh, the machine is currently idle sitting at a mythtv front end screen):

          me@mybox:/sys/class/drm/card0/device# cat /sys/class/drm/card0/device/power_method
          profile

          me@mybox:/sys/class/drm/card0/device# cat /sys/class/drm/card0/device/power_profile
          default

          me@mybox:/sys/kernel/debug/dri/0# cat /sys/kernel/debug/dri/0/radeon_pm_info
          default engine clock: 200000 kHz
          current engine clock: 11880 kHz
          default memory clock: 667000 kHz


          So an order of magnitude difference between Catalyst and r600g is to be expected if Michael left the power management in its default state. If he forced the APU under Gallium3D into high performance mode (or maybe dynpm profile), things would have probably been different.

          I'm not positive about how the default clocking on the APUs work, but I'm seeing some variation in the GPU clock on my machine. It goes as low as 7Mhz and as high as 30Mhz when idling, and I'm not sure how conservative the reclocking (which seems enabled by EFI/BIOS by default) actually is. So forcing the APU to high-performance mode might help things.

          Comment


          • #15
            Originally posted by log0 View Post
            I see. What I am wondering about is why do graphics card drivers require this amount of manpower? Is it the hardware or the API? How is it possible to write driver code that could be more than ten times faster? Or is it that the hardware doesn't map to the exposed API(OpenGL) and requires complex translation? Sorry if the questions sound silly, I've never worked with bare hardware.
            Yes, the translation is really complex as far as OpenGL is concerned. Implementing a performant shader compiler is also not easy. Then, there are hardware optimizations which you can use, like texture tiling, hierarchical Z-Stencil buffers, colorbuffer compression, etc.

            We need a driver which:
            1) doesn't starve the GPU by doing too much CPU work
            2) doesn't synchronize the CPU with the GPU, so that the two can operate asynchronously
            3) takes advantage of every hardware feature which improves performance

            FYI, I was told by some NVIDIA guy face-to-face a few years ago that their Vista GPU drivers had 20 million lines of code. The entire Linux kernel has only 14.3M.

            Comment


            • #16
              FWIW, we have hundreds of developers working on the closed source AMD driver and the closed driver was ~40 million LOC last time I checked which was a while back.

              Comment


              • #17
                Originally posted by pingufunkybeat View Post
                N.B. I'm not a driver developer, just an interested observer

                In this case, the open source driver was forced into a low-power mode, while the proprietary driver was going on at full blast. Also, it's possible that not all functionality (such as tiling) was enabled on the open source driver. When there's an order-of-magnitude difference, then either something is wrong, or the driver is too new and there's lots of work needed still.

                The problem with OpenGL drivers (and GPU drivers in general) is that they are amazingly complex hardware that takes incredible amounts of code (especially full OpenGL support). It's much more complex than a network card driver or a mouse driver. With most chips, the Gallium3d drivers for radeons are around 60-70% of the proprietary driver, which is as close as you can get with "regular" effort.

                Then the things get complicated. A GPU driver runs on the CPU and often has to do many things before it can prepare a frame for rendering. If it is not optimised, then the time adds up, lots of little delays all over the stack, which need to be optimised one-by-one, hundreds of them. This is very time-intensive and takes a lot of manpower. If you are running something at 100 frames per second ,then this quickly adds up and makes a huge difference. Even a small delay multiplied by 100 becomes a long wait. That's why the developers are first focusing on getting a driver working correctly, and only then try to optimise it.

                With some work, and Tom's VLIW packetiser and the new shader compiler, and the Hyper-Z support, things should come to more than 80% of the proprietary performance, perhaps even more (rough guess). That's really good, and the additional work after than becomes too complex, with very little gain.
                I don't understand why people believe shader optimization is a big issue. On all the benchmark in this article a better shader would most likely wouldn't make a mesurable differences. Marek has far better point to explain the gap.Oh and if you want to convince yourself that shader is not a big issue, take a big shader of doom3, do a sample gl program that use that shader to draw quad covering biggest fbo possible on your generation, draw thousand of time, then hand optimize the shader and hack r600g to use your hand optimized version. Compare, last time i did such things the difference wasn't that big.

                Comment


                • #18
                  glisse it's because nouveau is faster than radeon: considering nouveau isn't backed by nvidia and there isn't any documentation that's quite strange and peoples started searching for a culprit.
                  ## VGA ##
                  AMD: X1950XTX, HD3870, HD5870
                  Intel: GMA45, HD3000 (Core i5 2500K)

                  Comment


                  • #19
                    @glisse

                    Do you mean TGSI or r600 asm?

                    My GSOC shader (TGSI) was 20% faster when hand-optimized compared to Mesa's GLSL compiler. But that's only at TGSI level, I believe it would be much faster if properly compiled (maybe wrong word) down to r600g asm instead of the simple replacement that I understand is the current status.

                    Comment


                    • #20
                      The 2x difference doesn't surprise me, although some of the 10x differences do since IIRC the corresponding number on discrete GPUs is more like 3-5x than 10x. THe difference may be memory bandwidth sensitivity (optimizing to reduce bandwidth consumption is one of the more complex aspects of driver optimization) or just that the default clock state for APUs is even lower than I remembered.

                      As others have said it would be good to keep in mind which performance features in the HW are enabled and which are still being worked on. Looks like the article ran with driver defaults - Michael is that correct ? Any idea what clocks were being used (ie what the VBIOS power state called for) ?

                      Testing with defaults (ie ignoring some WIP improvements which are still off by deafult) seems reasonable to me as long as you aren't drawing "it's gonna be years" conclusions from the results.
                      Test signature

                      Comment

                      Working...
                      X