Announcement

Collapse
No announcement yet.

XA Acceleration Coming To Freedreno Gallium3D

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • XA Acceleration Coming To Freedreno Gallium3D

    Phoronix: XA Acceleration Coming To Freedreno Gallium3D

    Rob Clark has sent out a set of patches enabling the XA state tracker for the Freedreno Gallium3D graphics driver...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    GJ Rob.



    To be sure, You do not compare against Qualcomm blobs?

    Comment


    • #3
      Why radeonsi not use XA??? Gallium3d more low level and flex than OpenGL IMHO.

      Comment


      • #4
        Originally posted by przemoli View Post
        GJ Rob.



        To be sure, You do not compare against Qualcomm blobs?
        thanks

        Someone from #freedreno built glmark2 for android hacked to render at 800x600 (to match the default window size for glmark2 on linux/x11), so I've been trying to compare glmark2-es scores. It's a bit tricky to get a valid apples-to-apples comparison, but here is a dump of the raw results I have so far. Note that the android results are from a newer glmark2 which adds a few additional scenes. And at the moment with freedreno there are two scenes (terrain and loop) which need to be excluded due to unsupported features, etc. (And interestingly there are a couple tests which work with freedreno but not the blob, thanks to extensions implemented in mesa/gallium.) Anyways, you shouldn't really compare the final summary number, but rather compare individual tests.

        freedreno - a320:
        on bstem/apq8060a - adreno 320
        (same clocks as apq8064 in nexus4, but only dual core instead of quad core cpu.. I think same/similar chip as moto-x)
        ================================================== =====
        glmark2 2012.12
        ================================================== =====
        OpenGL Information
        GL_VENDOR: freedreno
        GL_RENDERER: Gallium 0.4 on FD320
        GL_VERSION: OpenGL ES 2.0 Mesa 10.1.0-devel (git-f001571)
        ================================================== =====
        [build] use-vbo=false: FPS: 499 FrameTime: 2.004 ms
        [build] use-vbo=true: FPS: 553 FrameTime: 1.808 ms
        [texture] texture-filter=nearest: FPS: 407 FrameTime: 2.457 ms
        [texture] texture-filter=linear: FPS: 370 FrameTime: 2.703 ms
        [texture] texture-filter=mipmap: FPS: 400 FrameTime: 2.500 ms
        [shading] shading=gouraud: FPS: 487 FrameTime: 2.053 ms
        [shading] shading=blinn-phong-inf: FPS: 435 FrameTime: 2.299 ms
        [shading] shading=phong: FPS: 337 FrameTime: 2.967 ms
        [bump] bump-render=high-poly: FPS: 301 FrameTime: 3.322 ms
        [bump] bump-render=normals: FPS: 498 FrameTime: 2.008 ms
        [bump] bump-render=height: FPS: 397 FrameTime: 2.519 ms
        [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 287 FrameTime: 3.484 ms
        [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 107 FrameTime: 9.346 ms
        [pulsar] light=false:quads=5:texture=false: FPS: 519 FrameTime: 1.927 ms
        [desktop] blur-radius=5:effect=blur: passes=1:separable=true:windows=4: FPS: 100 FrameTime: 10.000 ms
        [desktop] effect=shadow:windows=4: FPS: 318 FrameTime: 3.145 ms
        [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 109 FrameTime: 9.174 ms
        [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 104 FrameTime: 9.615 ms
        [buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 194 FrameTime: 5.155 ms
        [ideas] speed=duration: FPS: 189 FrameTime: 5.291 ms
        [jellyfish] <default>: FPS: 221 FrameTime: 4.525 ms
        [conditionals] fragment-steps=0:vertex-steps=0: FPS: 540 FrameTime: 1.852 ms
        [conditionals] fragment-steps=5:vertex-steps=0: FPS: 325 FrameTime: 3.077 ms
        [conditionals] fragment-steps=0:vertex-steps=5: FPS: 524 FrameTime: 1.908 ms
        [function] fragment-complexity=low:fragment-steps=5: FPS: 406 FrameTime: 2.463 ms
        [function] fragment-complexity=medium:fragment-steps=5: FPS: 294 FrameTime: 3.401 ms
        ================================================== =====
        glmark2 Score: 343
        ================================================== =====
        blob - a330:
        nexus 5 / apq8974 - adreno 330
        These results come from the guy who built glmark2.. and with kgsl forced in performance mode (otherwise it is significantly slower.. but this is a more fair comparision to how drm/msm works). The a330 should be 25-40% faster compared to a320.
        I/glmark2 (15507): glmark2 2013.08.07
        I/glmark2 (15507): OpenGL Information
        I/glmark2 (15507): GL_VENDOR: Qualcomm
        I/glmark2 (15507): GL_RENDERER: Adreno (TM) 330
        I/glmark2 (15507): GL_VERSION: OpenGL ES 3.0 [email protected] AU@ (CL@)
        I/glmark2 (15507): [build] use-vbo=false: FPS: 345 FrameTime: 2.899 ms
        I/glmark2 (15507): [build] use-vbo=true: FPS: 348 FrameTime: 2.874 ms
        I/glmark2 (15507): [texture] texture-filter=nearest: FPS: 573 FrameTime: 1.745 ms
        I/glmark2 (15507): [texture] texture-filter=linear: FPS: 554 FrameTime: 1.805 ms
        I/glmark2 (15507): [texture] texture-filter=mipmap: FPS: 567 FrameTime: 1.764 ms
        I/glmark2 (15507): [shading] shading=gouraud: FPS: 257 FrameTime: 3.891 ms
        I/glmark2 (15507): [shading] shading=blinn-phong-inf: FPS: 495 FrameTime: 2.020 ms
        I/glmark2 (15507): [shading] shading=phong: FPS: 514 FrameTime: 1.946 ms
        I/glmark2 (15507): [bump] bump-render=high-poly: FPS: 381 FrameTime: 2.625 ms
        I/glmark2 (15507): [bump] bump-render=normals: FPS: 608 FrameTime: 1.645 ms
        I/glmark2 (15507): [bump] bump-render=height: FPS: 572 FrameTime: 1.748 ms
        I/glmark2 (15507): [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 411 FrameTime: 2.433 ms
        I/glmark2 (15507): [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 292 FrameTime: 3.425 ms
        I/glmark2 (15507): [pulsar] light=false:quads=5:texture=false: FPS: 545 FrameTime: 1.835 ms
        W/Adreno-ES20(15507): <core_glFramebufferTexture2D:2419>: GL_INVALID_OPERATION
        I/glmark2 (15507): [desktop] blur-radius=5:effect=blur: passes=1:separable=true:windows=4: FPS: 236 FrameTime: 4.237 ms
        I/glmark2 (15507): [desktop] effect=shadow:windows=4: FPS: 355 FrameTime: 2.817 ms
        E/glmark2 (15507): Requested MapBuffer VBO update method but GL_OES_mapbuffer is not supported!
        I/glmark2 (15507): [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: Unsupported
        I/glmark2 (15507): [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 114 FrameTime: 8.772 ms
        E/glmark2 (15507): Requested MapBuffer VBO update method but GL_OES_mapbuffer is not supported!
        I/glmark2 (15507): [buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: Unsupported
        I/glmark2 (15507): [ideas] speed=duration: FPS: 289 FrameTime: 3.460 ms
        I/glmark2 (15507): [jellyfish] <default>: FPS: 392 FrameTime: 2.551 ms
        I/glmark2 (15507): [terrain] <default>: FPS: 55 FrameTime: 18.182 ms
        I/glmark2 (15507): [shadow] <default>: FPS: 240 FrameTime: 4.167 ms
        I/glmark2 (15507): [refract] <default>: FPS: 92 FrameTime: 10.870 ms
        I/glmark2 (15507): [conditionals] fragment-steps=0:vertex-steps=0: FPS: 534 FrameTime: 1.873 ms
        I/glmark2 (15507): [conditionals] fragment-steps=5:vertex-steps=0: FPS: 513 FrameTime: 1.949 ms
        I/glmark2 (15507): [conditionals] fragment-steps=0:vertex-steps=5: FPS: 493 FrameTime: 2.028 ms
        I/glmark2 (15507): [function] fragment-complexity=low:fragment-steps=5: FPS: 513 FrameTime: 1.949 ms
        I/glmark2 (15507): [function] fragment-complexity=medium:fragment-steps=5: FPS: 487 FrameTime: 2.053 ms
        I/glmark2 (15507): [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 510 FrameTime: 1.961 ms
        I/glmark2 (15507): [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 495 FrameTime: 2.020 ms
        I/glmark2 (15507): [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 533 FrameTime: 1.876 ms
        I/glmark2 (15507): glmark2 Score: 410
        with the gpu not forced to performance mode, the glmark2 score was 282.

        blob - a320:
        nexus 4 / apq8064 - adreno 320
        This is on my own phone. From hardware standpoint, it is most comparable. (Quad core cpu vs dual core, but that shouldn't really matter for glmark2.) But I don't have root on the device at the moment so I can't force it to performance mode.
        I/glmark2 (11620): glmark2 2013.08.07
        I/glmark2 (11620): OpenGL Information
        I/glmark2 (11620): GL_VENDOR: Qualcomm
        I/glmark2 (11620): GL_RENDERER: Adreno (TM) 320
        I/glmark2 (11620): GL_VERSION: OpenGL ES 3.0 [email protected] AU@ (CL@)
        I/glmark2 (11620): [build] use-vbo=false: FPS: 154 FrameTime: 6.494 ms
        I/glmark2 (11620): [build] use-vbo=true: FPS: 140 FrameTime: 7.143 ms
        I/glmark2 (11620): [texture] texture-filter=nearest: FPS: 178 FrameTime: 5.618 ms
        I/glmark2 (11620): [texture] texture-filter=linear: FPS: 179 FrameTime: 5.587 ms
        I/glmark2 (11620): [texture] texture-filter=mipmap: FPS: 176 FrameTime: 5.682 ms
        I/glmark2 (11620): [shading] shading=gouraud: FPS: 103 FrameTime: 9.709 ms
        I/glmark2 (11620): [shading] shading=blinn-phong-inf: FPS: 178 FrameTime: 5.618 ms
        I/glmark2 (11620): [shading] shading=phong: FPS: 176 FrameTime: 5.682 ms
        I/glmark2 (11620): [bump] bump-render=high-poly: FPS: 178 FrameTime: 5.618 ms
        I/glmark2 (11620): [bump] bump-render=normals: FPS: 179 FrameTime: 5.587 ms
        I/glmark2 (11620): [bump] bump-render=height: FPS: 178 FrameTime: 5.618 ms
        I/glmark2 (11620): [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 178 FrameTime: 5.618 ms
        I/glmark2 (11620): [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 179 FrameTime: 5.587 ms
        I/glmark2 (11620): [pulsar] light=false:quads=5:texture=false: FPS: 178 FrameTime: 5.618 ms
        I/glmark2 (11620): [desktop] blur-radius=5:effect=blur: passes=1:separable=true:windows=4: FPS: 156 FrameTime: 6.410 ms
        I/glmark2 (11620): [desktop] effect=shadow:windows=4: FPS: 148 FrameTime: 6.757 ms
        E/glmark2 (11620): Requested MapBuffer VBO update method but GL_OES_mapbuffer is not supported!
        I/glmark2 (11620): [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: Unsupported
        I/glmark2 (11620): [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 75 FrameTime: 13.333 ms
        E/glmark2 (11620): Requested MapBuffer VBO update method but GL_OES_mapbuffer is not supported!
        I/glmark2 (11620): [buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: Unsupported
        I/glmark2 (11620): [ideas] speed=duration: FPS: 147 FrameTime: 6.803 ms
        I/glmark2 (11620): [jellyfish] <default>: FPS: 179 FrameTime: 5.587 ms
        I/glmark2 (11620): [terrain] <default>: FPS: 37 FrameTime: 27.027 ms
        I/glmark2 (11620): [shadow] <default>: FPS: 155 FrameTime: 6.452 ms
        I/glmark2 (11620): [refract] <default>: FPS: 46 FrameTime: 21.739 ms
        I/glmark2 (11620): [conditionals] fragment-steps=0:vertex-steps=0: FPS: 177 FrameTime: 5.650 ms
        I/glmark2 (11620): [conditionals] fragment-steps=5:vertex-steps=0: FPS: 177 FrameTime: 5.650 ms
        I/glmark2 (11620): [conditionals] fragment-steps=0:vertex-steps=5: FPS: 178 FrameTime: 5.618 ms
        I/glmark2 (11620): [function] fragment-complexity=low:fragment-steps=5: FPS: 178 FrameTime: 5.618 ms
        I/glmark2 (11620): [function] fragment-complexity=medium:fragment-steps=5: FPS: 178 FrameTime: 5.618 ms
        I/glmark2 (11620): [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 177 FrameTime: 5.650 ms
        I/glmark2 (11620): [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 177 FrameTime: 5.650 ms
        I/glmark2 (11620): [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 177 FrameTime: 5.650 ms
        I/glmark2 (11620): glmark2 Score: 157

        Comment


        • #5
          Do I understand it right that You managed to make a320 as fast as it should be compared to a330 or even better since a330 is in FORCED mode and that is Quad core, to just 2 cores in a320?



          PS Seen some performance governors for GPU in Nexus 5 custom kernel (at least in franco kernel), maybe that is something useful for starting working on PM.

          Comment


          • #6
            Originally posted by przemoli View Post
            Do I understand it right that You managed to make a320 as fast as it should be compared to a330 or even better since a330 is in FORCED mode and that is Quad core, to just 2 cores in a320?
            Well, I don't expect the # of cpu cores to make much difference, although the single core performance and cpu governor can make some small difference. (maybe ~10%)

            The big difference between kgsl and drm/msm kernel pm is that kgsl scales the gpu frequency according to load, etc.. whereas (in the kernel that I was using on bstem) the drm driver puts gpu to max clock, and requests max interconnect bandwidth when the gpu is active, and then shuts down when gpu completes. Basically a "hurry up and wait/sleep" policy, which is the simplest possible and should be best for performance.

            What kgsl does should (I assume) improve battery life a bit more vs a simple "hurry up and wait" approach, for (hopefully) negligible trade-off in perceived performance. But runs the risk of interacting badly with benchmarks... ie, not responding quickly enough when gpu is rapidly transitioning active/idle, etc. I don't really have a good way to profile this with the blob (ofc, no access to qcom's perf tools), but I assume this to be the problem.

            Forcing kgsl to performance mode should, afaiu, make it equivalent to the drm/msm kernel that I was using with freedreno.

            Comment


            • #7
              Originally posted by robclark View Post
              Well, I don't expect the # of cpu cores to make much difference, although the single core performance and cpu governor can make some small difference. (maybe ~10%)

              The big difference between kgsl and drm/msm kernel pm is that kgsl scales the gpu frequency according to load, etc.. whereas (in the kernel that I was using on bstem) the drm driver puts gpu to max clock, and requests max interconnect bandwidth when the gpu is active, and then shuts down when gpu completes. Basically a "hurry up and wait/sleep" policy, which is the simplest possible and should be best for performance.

              What kgsl does should (I assume) improve battery life a bit more vs a simple "hurry up and wait" approach, for (hopefully) negligible trade-off in perceived performance. But runs the risk of interacting badly with benchmarks... ie, not responding quickly enough when gpu is rapidly transitioning active/idle, etc. I don't really have a good way to profile this with the blob (ofc, no access to qcom's perf tools), but I assume this to be the problem.

              Forcing kgsl to performance mode should, afaiu, make it equivalent to the drm/msm kernel that I was using with freedreno.
              Than, yes franco kernel have all those different strategies for setting GPU perf (drm/msm "performance governor" too). Reading Your post I understand that PM in a3xx is OK? Nice to know. (maybe due to radeon DPM I had low expectations here :P )

              Comment


              • #8
                Originally posted by przemoli View Post
                Than, yes franco kernel have all those different strategies for setting GPU perf (drm/msm "performance governor" too). Reading Your post I understand that PM in a3xx is OK? Nice to know. (maybe due to radeon DPM I had low expectations here :P )
                These little gpu's do not have their own memory controller (vram) and clks/regulators/etc are all controlled via normal linux frameworks in the kernel, so we fortunately don't have the reclocking type issues to worry about

                Comment


                • #9
                  Originally posted by stalkerg View Post
                  Why radeonsi not use XA??? Gallium3d more low level and flex than OpenGL IMHO.
                  XA doesn't accelerate nearly as many X operations as glamor does. At the moment it's basically just like EXA. IMHO, it's a much easier task to fix the fallback handling in glamor than to add acceleration to XA for additional X ops.

                  Comment

                  Working...
                  X