Announcement

Collapse
No announcement yet.

GLAMOR Radeon Shows 2D Acceleration Promise

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by ssvb View Post
    Hmm, seems like this is really difficult for me to understand. Why would we even need to "update shadowfb in system memory with the results of the 3D rendering in GPU memory"? Doesn't it make a lot more sense to just set up something like periodic DMA transfers from shadowfb system memory to GPU memory (synchronized with screen refresh) and composite it with 3D or Xv buffers together on the GPU side before sending the combined pixel data to HDMI? Or is there some kind of hardware limitation here? Maybe I just need to check the hardware manuals first and not waste your time with silly questions.
    It depends on the a lot of factors. Basically, the way shadowfb works is that the master copy of the pixmaps live in system ram. They are stored in system ram and operated on by the CPU in system ram. Then periodically, the front buffer is updated with the results of the all the operations in system ram. It's a one way operation. The CPU copy is always written to the front buffer in vram. When the GPU renders to a buffer in vram (say Xv or OpenGL), the shadow copy no longer has the master copy. Say you want to alpha blend a 3D window with the desktop and some other windows, now you have to get the GPU rendered image into system ram so that the CPU can blend it with the other windows. But wait, can't you have the GPU do the alpha blend? Sure, but then you have to migrate all the CPU buffers that you want to blend into vram so the GPU can access them, so you still have to copy a lot of data around. Wait, can I store everything in gart and get the best of both worlds? Sure, but the problem there is that gart pages have to be pinned so the kernel doesn't swap them out while the GPU is accessing them. Since the kernel can't swap the pages, this limits the amount of memory available to the rest of the system. I think the kernel limits the amount of pinned memory to avoid a driver DOSing the system. With graphics buffers can be huge.

    It's easier with a compositor. And you could even support the non-composited case by emulating a composited environment in the driver. In that case you could store the backing pixmaps wherever it makes the most sense (vram or system ram) and then use the GPU to composite the final image. For CPU rendered buffers you'd still need to migrate them to pinned memory for the GPU to access them, but you could keep the shadow front buffer in pinned memory. There are also some tricky corner cases to deal with. It's possible, but it basically comes down to writing a new acceleration architecture which would take time to write and mature. Before going down that road I think it makes sense to see what can be done with an existing acceleration architecture like glamor.

    Comment


    • #42
      Originally posted by agd5f View Post
      It's possible, but it basically comes down to writing a new acceleration architecture which would take time to write and mature. Before going down that road I think it makes sense to see what can be done with an existing acceleration architecture like glamor.
      Must... create... new... architecture...

      Test signature

      Comment


      • #43
        Originally posted by bridgman View Post
        Must... create... new... architecture...
        You miss the point EXA is an internal implementation detail which is not relevant for anyone but DDX writers. The users could not care less if it goes away, more likely they will not even notice.

        Having multiple standards is bad. But a healthy competition between different implementations of the same standard is good.

        Comment


        • #44
          I believe the intention was to do XV as it used to be, an overlay that can't be read back. Sure, no video on your spinning cube, but that wouldn't need any reads back to cpu.

          So shadowfb + movies in an overlay that can't be read back or composited. Best of both worlds, accelerated color conversion/scaling, no tearing in movies, etc.

          Comment


          • #45
            So now that it works with xorg-git I tested it. HD 6550M.

            Code:
            ~ % grep -i glamor /var/log/Xorg.0.log
            [   295.584] (II) LoadModule: "glamoregl"
            [   295.584] (II) Loading /usr/lib/xorg/modules/libglamoregl.so
            [   295.587] (II) Module glamoregl: vendor="X.Org Foundation"
            [   295.597] (**) RADEON(0): Option "AccelMethod" "glamor"
            [   295.597] (II) Loading sub module "glamoregl"
            [   295.597] (II) LoadModule: "glamoregl"
            [   295.597] (II) Loading /usr/lib/xorg/modules/libglamoregl.so
            [   295.597] (II) Module glamoregl: vendor="X.Org Foundation"
            [   295.597] (II) glamor: OpenGL accelerated X.org driver based.
            [   295.608] (II) glamor: EGL version 1.4 (DRI2):
            [   295.633] (II) RADEON(0): glamor detected, initialising EGL layer.
            [   296.112] (II) RADEON(0): Use GLAMOR acceleration.
            It seems to be mostly fine but some things are excruciatingly slow. Not, the performance is not just good, but really slow. gtkperf needs extremely long for the GtkDrawingArea Line drawing test, causes high CPU usage and slowdowns in X.
            I also tried ltris as a simple "real world" usage in a game and it also causes high cpu usage in X and the little animations are very slow.

            Code:
            GtkEntry - time:  0,05
            GtkComboBox - time:  1,37
            GtkComboBoxEntry - time:  1,03
            GtkSpinButton - time:  0,10
            GtkProgressBar - time:  0,07
            GtkToggleButton - time:  0,10
            GtkCheckButton - time:  0,07
            GtkRadioButton - time:  0,09
            GtkTextView - Add text - time:  0,20
            GtkTextView - Scroll - time:  0,01
            GtkDrawingArea - Lines - time: 224,47
            GtkDrawingArea - Circles - time: 26,41
            GtkDrawingArea - Text - time:  0,80
            GtkDrawingArea - Pixbufs - time:  0,59
             --- 
            Total time: 255,36
            Even when completely done in software: 224 seconds for the line drawing? Is there something wrong or is this just the way it is for now?

            edit: exa for comparison
            Code:
            GtkEntry - time:  0,04
            GtkComboBox - time:  1,24
            GtkComboBoxEntry - time:  0,90
            GtkSpinButton - time:  0,11
            GtkProgressBar - time:  0,07
            GtkToggleButton - time:  0,11
            GtkCheckButton - time:  0,05
            GtkRadioButton - time:  0,09
            GtkTextView - Add text - time:  0,19
            GtkTextView - Scroll - time:  0,00
            GtkDrawingArea - Lines - time:  1,41
            GtkDrawingArea - Circles - time:  3,74
            GtkDrawingArea - Text - time:  0,92
            GtkDrawingArea - Pixbufs - time:  0,16
             --- 
            Total time:  9,05
            Last edited by ChrisXY; 08 August 2012, 08:34 AM.

            Comment


            • #46
              There's lots of room left for optimizations, they have just started to support this new acceleration path.

              However, this is what Chris Wilson said about Glamor (some statements refer to the intel driver) in another thread.

              Originally posted by ickle View Post
              There is a significant impedance mismatch between X and GL, that is tricky to overcome and adds lots of extra complexity, and with the extra abstraction layer you cannot exploit hardware features not exposed through a GL extension. Also you need to leak many details through that abstraction layer in order to allocate shared objects between multiple clients and your acceleration routines (which is quite, quite scary and hairy.) And there is the tiny issue of having a critcal system process relying on several hundred thousand lines of code that has not been written with robustness in mind, and having no failsafe method.

              With regards to performance, the current bottlenecks I see in glamor are due to the CPU overhead of the Intel mesa stack, and the many assumptions that interact extremely poorly with the 2D workload of glamor. Where you do find yourself mostly GPU bound (such as the fish-demo), glamor still falls short by 10-30% due to inefficiences in the GPU programming (too many state changes and poor optimisation of shaders) and the multiple abstraction layers. However, being GPU bound is the exception and typically you end up being ratelimited by one of the paths that are orders of magnitude slower. And then there is the issue that glamor is an absolute resource hog, as the intel mesa driver's buffer management has never been used like that before...

              In a perfect world, glamor would equal the performance of a highly specialised driver like SNA; much of the routines used in SNA can be mapped directly onto the OpenGL API - and most have been copied over to glamor. Lots of work needs to be done to tune the entire mesa stack, a lot of which I suspect will only benefit glamor.

              And remember, RENDER acceleration is just one small part of the driver.

              Comment


              • #47
                Yep, unfortunately every option (including starting a new DDX architecture or continuing the existing EXA-based DDX) have real, well known challenges. That's what makes picking one so much fun

                I think it's fair to say that any of the options will need work to get to a "happy place" -- the issue was that glamor seemed to offer the best combination of good return on short term work and not too many architectural obstacles for the longer term.
                Test signature

                Comment


                • #48
                  Originally posted by ChrisXY View Post
                  So now that it works with xorg-git I tested it. HD 6550M.

                  Code:
                  ~ % grep -i glamor /var/log/Xorg.0.log
                  [   295.584] (II) LoadModule: "glamoregl"
                  [   295.584] (II) Loading /usr/lib/xorg/modules/libglamoregl.so
                  [   295.587] (II) Module glamoregl: vendor="X.Org Foundation"
                  [   295.597] (**) RADEON(0): Option "AccelMethod" "glamor"
                  [   295.597] (II) Loading sub module "glamoregl"
                  [   295.597] (II) LoadModule: "glamoregl"
                  [   295.597] (II) Loading /usr/lib/xorg/modules/libglamoregl.so
                  [   295.597] (II) Module glamoregl: vendor="X.Org Foundation"
                  [   295.597] (II) glamor: OpenGL accelerated X.org driver based.
                  [   295.608] (II) glamor: EGL version 1.4 (DRI2):
                  [   295.633] (II) RADEON(0): glamor detected, initialising EGL layer.
                  [   296.112] (II) RADEON(0): Use GLAMOR acceleration.
                  GtkDrawingArea - Lines - time: 224,47
                  GtkDrawingArea - Circles - time: 26,41
                  GtkDrawingArea - Text - time: 0,80
                  GtkDrawingArea - Pixbufs - time: 0,59
                  ---
                  Total time: 255,36[/code]

                  Even when completely done in software: 224 seconds for the line drawing? Is there something wrong or is this just the way it is for now?


                  ---
                  Total time: 9,05[/code]
                  The draw lines haven't been fully optimzied yet, glamor has been focusing on the compositing/rendering part so far.
                  For example drawing non vertical/horizontal lines fallback to software rendering which is very very slow, so it took
                  224 seconds at your benchmark. But that is not difficult to accelerate, just need a simple shader to do that.

                  Comment


                  • #49
                    Originally posted by gongzg View Post
                    The draw lines haven't been fully optimzied yet, glamor has been focusing on the compositing/rendering part so far.
                    kwin with opengl effects did indeed work quite well already.

                    Originally posted by gongzg View Post
                    For example drawing non vertical/horizontal lines fallback to software rendering which is very very slow, so it took
                    224 seconds at your benchmark. But that is not difficult to accelerate, just need a simple shader to do that.
                    Good to know, thanks.

                    Comment

                    Working...
                    X