Announcement

Collapse
No announcement yet.

Gallium3D's Freedreno Driver Gets A New Compiler

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gallium3D's Freedreno Driver Gets A New Compiler

    Phoronix: Gallium3D's Freedreno Driver Gets A New Compiler

    Rob Clark has landed a new shader compiler into his Freedreno Gallium3D open-source graphics driver for Qualcomm's Adreno A3xx hardware...

    http://www.phoronix.com/vr.php?view=MTU5MTM

  • #2
    Does it help with perf. as well as with compliance?

    And is there some work going on on adopting it to Android (I know that author is not interested in it, but seen some old XDA topics about such, and wonder if its still alive)

    Thx for work

    Comment


    • #3
      Originally posted by przemoli View Post
      Does it help with perf. as well as with compliance?
      yup, should help with performance as well.. at least for things that are not already CPU limited. If you look at the shaders it generates, it quite commonly generates shaders w/ 1/3rd # of instructions and using half # of registers. (It is really quite a wide range, but it varies from not making much difference on very trivial shaders, to pretty substantial difference on more complex shaders.)

      NOTE: gpus typically schedule as many threads in parallel as it can, up until the point it hits a limit of physical registers or execution units.. depending on ratio of physical registers to execution units, if you are hitting the limit of # of registers, then reduction of # of registers used by the shader is a pretty big deal.

      What that translates into in the real world.. well, it varies. Supertuxkart is already cpu limited (once hw binning landed), so not much difference. Xonotic it seems to be worth ~1fps or so.. we might be getting to the point where we are cpu limited there too (but otoh it isn't very complex shaders).

      I probably need to find something a bit more demanding on the shader side of things. So far the a3xx shader core seems to mostly just yawn with what I throw at it. Even with the original very unoptimal compiler it was surprisingly fast.

      I'll try to find some time to write up a blog post w/ more details.. I was too tired last night, but I wanted to land the initial new-compiler due to the large # of things it fixed.

      Originally posted by przemoli View Post
      And is there some work going on on adopting it to Android (I know that author is not interested in it, but seen some old XDA topics about such, and wonder if its still alive)
      There was someone on #freedreno who was playing around w/ getting stuff to build for android.. not sure if that includes mesa yet (iirc, he was starting w/ libdrm_freedreno and the fdre tests apps).

      Comment


      • #4
        Will lay may hands on Nexus 5 this month (hopefully ).

        It should not be so CPU limited (compared to Your's dev board), and will be able to test some more.

        Comment


        • #5
          Impressive work, Rob! Have you made any comparisons how this new compiler fares against the binary blob one?

          Comment


          • #6
            Originally posted by przemoli View Post
            Will lay may hands on Nexus 5 this month (hopefully ).

            It should not be so CPU limited (compared to Your's dev board), and will be able to test some more.
            heheh, well I have a 8074 dragonboard too.. which is basically the no-modem version of the chip in the n5. It is a bit faster, but not night-and-day difference. It may be more cpu limited (faster gpu).. I need to build a kernel for dragonboard w/ my trace events to profile more closely.

            And to be honest, I need to tweak the thermal throttling on my dragonboard kernel so it doesn't kick in so soon, in order to have a good comparison between the two.

            fwiw, I've noticed some messages, for example in xonotic startup, about not using SSE instructions.. so perhaps there is something in the game (or game engine) which could benefit from some NEON porting..

            Comment


            • #7
              Originally posted by Ancurio View Post
              Impressive work, Rob! Have you made any comparisons how this new compiler fares against the binary blob one?
              well, I don't actually have blob drivers for linux, so I don't have a good way to make an apples-to-apples comparison to blob.

              I can manually inspect output of blob compiler for same shader, and compare to what I generate. Very hand-wavey answer is blob is still better at some things, but we are now in the right ballpark. Ie. it is a matter of some percent, not a matter of 2x or 3x difference.

              (The bigger difference is that there are still a lot of features I have not implemented yet, like non-unrolled loops and non-inlined functions)

              Comment


              • #8
                Originally posted by robclark View Post
                (The bigger difference is that there are still a lot of features I have not implemented yet, like non-unrolled loops and non-inlined functions)
                Power management?
                I'm one of those interested in Freedreno on Android... but at least as much from the perspective of battery powered mobile devices as stationary wall-plugged devices. The main thing stopping me from moving forward with Freedreno on Android is how it would affect battery life. If that can be made *sane*, it would be a no-brainer to move forward. The qualcomm blobs are really really terrible. They've even completely stopped supporting A2xx, both in blobs, and in their code. CAF only supports A2xx up to kernel 3.4, which is getting a little long of tooth.

                Comment


                • #9
                  Originally posted by robclark View Post
                  well, I don't actually have blob drivers for linux, so I don't have a good way to make an apples-to-apples comparison to blob.

                  I can manually inspect output of blob compiler for same shader, and compare to what I generate. Very hand-wavey answer is blob is still better at some things, but we are now in the right ballpark. Ie. it is a matter of some percent, not a matter of 2x or 3x difference.

                  (The bigger difference is that there are still a lot of features I have not implemented yet, like non-unrolled loops and non-inlined functions)
                  Thanks, that's what I wanted to know (how the generated byte codes compare).

                  Comment


                  • #10
                    Originally posted by droidhacker View Post
                    Power management?
                    well.. not so much a compiler feature.. other than perhaps a better compiler lets the gpu finish sooner and shut down.

                    That said (unrelated to compiler), I do have a patch floating around that at least powers off the gpu when it is inactive. That should make it good enough for battery powered devices.. I don't have any good way to make power measurements, but I do not think the GPU frequencies are high enough yet to get *that* much benefit from something more elaborate than a 'hurry up and wait' power mgmt scheme for gpu.

                    Originally posted by droidhacker View Post
                    I'm one of those interested in Freedreno on Android... but at least as much from the perspective of battery powered mobile devices as stationary wall-plugged devices. The main thing stopping me from moving forward with Freedreno on Android is how it would affect battery life. If that can be made *sane*, it would be a no-brainer to move forward. The qualcomm blobs are really really terrible. They've even completely stopped supporting A2xx, both in blobs, and in their code. CAF only supports A2xx up to kernel 3.4, which is getting a little long of tooth.
                    <glass_half_full>well, qcom does better than some other mobile gpu vendor, in that one userspace blob can actually support more 1 single device</glass_half_full>

                    but yeah, anything that is not very recent probably isn't likely to get any kernel newer than 3.4 from CAF. You can probably blame google/android for that, since they don't require kernel version bumps for existing device updates to new pastry.

                    Comment


                    • #11
                      Originally posted by robclark View Post
                      well.. not so much a compiler feature.. other than perhaps a better compiler lets the gpu finish sooner and shut down.

                      That said (unrelated to compiler), I do have a patch floating around that at least powers off the gpu when it is inactive. That should make it good enough for battery powered devices.. I don't have any good way to make power measurements, but I do not think the GPU frequencies are high enough yet to get *that* much benefit from something more elaborate than a 'hurry up and wait' power mgmt scheme for gpu.
                      Well that would certainly be helpful....

                      <glass_half_full>well, qcom does better than some other mobile gpu vendor, in that one userspace blob can actually support more 1 single device</glass_half_full>

                      but yeah, anything that is not very recent probably isn't likely to get any kernel newer than 3.4 from CAF. You can probably blame google/android for that, since they don't require kernel version bumps for existing device updates to new pastry.
                      The kernel version is only half the problem. CAF initially had code in 3.10 for A2xx, but deliberately stripped it out. When you add to that the fact that they haven't even provided a userspace blob for A2xx since June, **despite** moving forward (incompatible) in the A2xx support for their 3.4 kernel... it gets pretty ugly.

                      Comment


                      • #12
                        Hats off to Rob, terrific job!

                        Comment


                        • #13
                          Originally posted by droidhacker View Post
                          The kernel version is only half the problem. CAF initially had code in 3.10 for A2xx, but deliberately stripped it out. When you add to that the fact that they haven't even provided a userspace blob for A2xx since June, **despite** moving forward (incompatible) in the A2xx support for their 3.4 kernel... it gets pretty ugly.
                          oh, whoops, I didn't realize they stripped support out of the userspace blob too. I guess since 3.10 kernel branch doesn't actually support any devices which have a2xx, I guess I shouldn't be too surprised.

                          Well, the only real solution is to get kernel driver upstream, so kernel ABI is locked and backwards compatibility maintained for (more or less) eternity ;-)

                          Comment


                          • #14
                            If you can get the open source driver to meet or beat the blob in every way, do you think they might begin packaging their products with the open source driver instead? Or do they have NIH Syndrome?

                            Comment


                            • #15
                              Originally posted by Prescience500 View Post
                              If you can get the open source driver to meet or beat the blob in every way, do you think they might begin packaging their products with the open source driver instead? Or do they have NIH Syndrome?
                              well, "they" is a large set.. afaiu it is the handset/tablet/whatever maker who decides what sw ships on a device. I would assume that qcom's large customers (moto/samsung/etc) don't really have any problem getting support from qcom, which would be an incentive to stick with the blob.

                              But otoh, the small companies, and linux/hacker/opensrc community for that matter, will benefit from availability (and hopefully the easy of working with) the free/open driver.

                              Comment

                              Working...
                              X