Announcement

Collapse
No announcement yet.

AMD Fusion On Gallium3D Leaves A Lot To Be Desired

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Fusion On Gallium3D Leaves A Lot To Be Desired

    Phoronix: AMD Fusion On Gallium3D Leaves A Lot To Be Desired

    It's been a few months since last running any AMD Fusion tests under Linux, so here's a look at the AMD A8-3870K "Llano" APU performance under both the latest Catalyst driver and the open-source Radeon Gallium3D stack with Ubuntu 12.04. Besides the open-source driver being handily beaten by the Catalyst binary driver, the power efficiency is also a disappointment.

    http://www.phoronix.com/vr.php?view=17255

  • #2
    VLIW... without a better shader compiler the radeon driver don't have any chance.

    and amd is not planing to build a shader compiler based on obsolete technique.

    the hd7970 will get a proper shader compiler.

    in other words opensource divers needs 3-4 years more time to catch up. -


    RIP- VLIW...

    Comment


    • #3
      OK. I don't meant this as a criticism on driver developers. I am sure they are doing their best. And I've got no idea about writing device drivers. But I am wondering how it is possible that one implementation is an order of magnitude slower than another. Is it the complex hardware interface? Or is OpenGL so broken making it so difficult to write fast, efficient drivers? Is the nouveau approach to reverse-engineer a well performing driver maybe the better approach(assuming that a faster driver exists)?

      Comment


      • #4
        Originally posted by Qaridarium View Post
        VLIW... without a better shader compiler the radeon driver don't have any chance.

        and amd is not planing to build a shader compiler based on obsolete technique.

        the hd7970 will get a proper shader compiler.

        in other words opensource divers needs 3-4 years more time to catch up. -


        RIP- VLIW...
        Don't be so sure. Tom Stellar is integrating LLVM backend for r600g as we speak, and once it is done, and LLVM->VLIW packetizer is finished(it is started), we can all enjoy faster shaders both graphics and compute. 3-4 years is awfully pesimistic.

        Comment


        • #5
          Originally posted by log0 View Post
          OK. I don't meant this as a criticism on driver developers. I am sure they are doing their best. And I've got no idea about writing device drivers. But I am wondering how it is possible that one implementation is an order of magnitude slower than another. Is it the complex hardware interface? Or is OpenGL so broken making it so difficult to write fast, efficient drivers? Is the nouveau approach to reverse-engineer a well performing driver maybe the better approach(assuming that a faster driver exists)?
          The thing is that, nvidia had until kepler, sheduler in hardware, and thus it optimizes shareders itself, rather than relying on driver code to to that(in case of AMD VLIW). With GCN AMD integrated hardware sheduler, so performance gap will shrink.

          Comment


          • #6
            Shader compiler *isn't* the culprit, desktop cards are ~50/60% of catalyst while this shit is two orders om magnitude slower
            ## VGA ##
            AMD: X1950XTX, HD3870, HD5870
            Intel: GMA45, HD3000 (Core i5 2500K)

            Comment


            • #7
              Aaah, Phoronix forget to set "GPU clock to LOW", which is AMD's advice for PM issues on open source stack!
              Look at this thread also...

              Comment


              • #8
                Originally posted by Death Knight View Post
                Aaah, Phoronix forget to set "GPU clock to LOW", which is AMD's advice for PM issues on open source stack!
                Look at this thread also...
                I guess that's the opposite case there. Phoronix probably used default state, which is usually low one on APUs. Tip: take a look at power usage chart.

                I suggest re-doing all tests forcing Catalyst to low or radeon to high.

                Comment


                • #9
                  Michael, what is the USB watt-meter that you use? I would like to buy it in order to do some tests, because I think that fps-per-watt is very interesting to measure progress in git drivers. Thank you

                  Comment


                  • #10
                    Originally posted by log0 View Post
                    OK. I don't meant this as a criticism on driver developers. I am sure they are doing their best. And I've got no idea about writing device drivers. But I am wondering how it is possible that one implementation is an order of magnitude slower than another. Is it the complex hardware interface? Or is OpenGL so broken making it so difficult to write fast, efficient drivers? Is the nouveau approach to reverse-engineer a well performing driver maybe the better approach(assuming that a faster driver exists)?
                    The problem is lack of manpower. r600g needs like another 5 developers working full-time to make sure the driver works best - adding new features, fixing bugs, profiling and identifying the bottlenecks and optimizing the driver. So far developers have been mostly adding new features and fixing bugs when they had time. Optimizations must be done in the entire stack, including shared components like core Mesa.

                    I wonder if Michael enabled 2D tiling.
                    Last edited by marek; 04-16-2012, 07:33 AM.

                    Comment


                    • #11
                      I see. What I am wondering about is why do graphics card drivers require this amount of manpower? Is it the hardware or the API? How is it possible to write driver code that could be more than ten times faster? Or is it that the hardware doesn't map to the exposed API(OpenGL) and requires complex translation? Sorry if the questions sound silly, I've never worked with bare hardware.
                      Last edited by log0; 04-16-2012, 09:56 AM.

                      Comment


                      • #12
                        Open source driver seems roughly the same level of performance
                        as open source driver for Intel's SB graphics. I guess one should
                        really use Catalyst. If your card is supported, though.

                        Comment


                        • #13
                          Originally posted by log0 View Post
                          I see. What I am wondering about is why do graphics card drivers require this amount of manpower? Is it the hardware or the API? How is it possible to write driver code that could be more than ten times faster? Or is it that the hardware doesn't map to the exposed API(OpenGL) and requires complex translation? Sorry if the questions sound silly, I've never worked with bare hardware.
                          N.B. I'm not a driver developer, just an interested observer

                          In this case, the open source driver was forced into a low-power mode, while the proprietary driver was going on at full blast. Also, it's possible that not all functionality (such as tiling) was enabled on the open source driver. When there's an order-of-magnitude difference, then either something is wrong, or the driver is too new and there's lots of work needed still.

                          The problem with OpenGL drivers (and GPU drivers in general) is that they are amazingly complex hardware that takes incredible amounts of code (especially full OpenGL support). It's much more complex than a network card driver or a mouse driver. With most chips, the Gallium3d drivers for radeons are around 60-70% of the proprietary driver, which is as close as you can get with "regular" effort.

                          Then the things get complicated. A GPU driver runs on the CPU and often has to do many things before it can prepare a frame for rendering. If it is not optimised, then the time adds up, lots of little delays all over the stack, which need to be optimised one-by-one, hundreds of them. This is very time-intensive and takes a lot of manpower. If you are running something at 100 frames per second ,then this quickly adds up and makes a huge difference. Even a small delay multiplied by 100 becomes a long wait. That's why the developers are first focusing on getting a driver working correctly, and only then try to optimise it.

                          With some work, and Tom's VLIW packetiser and the new shader compiler, and the Hyper-Z support, things should come to more than 80% of the proprietary performance, perhaps even more (rough guess). That's really good, and the additional work after than becomes too complex, with very little gain.

                          Comment


                          • #14
                            Originally posted by log0 View Post
                            I see. What I am wondering about is why do graphics card drivers require this amount of manpower?
                            I think AMD & Nvidia easily have > 100 people programming on their (closed source) drivers, so "5 extra developers" isn't a big amount of manpower...

                            Comment


                            • #15
                              Originally posted by pingufunkybeat View Post
                              In this case, the open source driver was forced into a low-power mode, while the proprietary driver was going on at full blast. Also, it's possible that not all functionality (such as tiling) was enabled on the open source driver. When there's an order-of-magnitude difference, then either something is wrong, or the driver is too new and there's lots of work needed still.
                              From my A6-3500 series (via ssh, the machine is currently idle sitting at a mythtv front end screen):

                              me@mybox:/sys/class/drm/card0/device# cat /sys/class/drm/card0/device/power_method
                              profile

                              me@mybox:/sys/class/drm/card0/device# cat /sys/class/drm/card0/device/power_profile
                              default

                              me@mybox:/sys/kernel/debug/dri/0# cat /sys/kernel/debug/dri/0/radeon_pm_info
                              default engine clock: 200000 kHz
                              current engine clock: 11880 kHz
                              default memory clock: 667000 kHz


                              So an order of magnitude difference between Catalyst and r600g is to be expected if Michael left the power management in its default state. If he forced the APU under Gallium3D into high performance mode (or maybe dynpm profile), things would have probably been different.

                              I'm not positive about how the default clocking on the APUs work, but I'm seeing some variation in the GPU clock on my machine. It goes as low as 7Mhz and as high as 30Mhz when idling, and I'm not sure how conservative the reclocking (which seems enabled by EFI/BIOS by default) actually is. So forcing the APU to high-performance mode might help things.

                              Comment

                              Working...
                              X