Announcement

Collapse
No announcement yet.

GLAMOR Radeon Shows 2D Acceleration Promise

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by bridgman View Post
    I guess I don't understand the pushback -- people are effectively saying "if you need something like a full 3D driver in the ddx to get decent performance then we expect you to do it" and "OMG why are they looking at Glamour (a ddx built over a full 3D driver) ??"
    I guess it's that glamor both tears and is slower in every benchmark published, on every hardware measured

    Plus there's a belief in the air that "normal" 2d would already be running on hd7k.

    Comment


    • #32
      Originally posted by curaga View Post
      I guess it's that glamor both tears and is slower in every benchmark published, on every hardware measured
      Slower than what? For "2D", software rendering is almost always faster than hw acceleration (EXA, UXA, and even SNA in a lot of cases). RENDER semantics map poorly to hw. Glamor is roughly on par with EXA today and makes it a lot easier to improve support in the future.

      Comment


      • #33
        Originally posted by agd5f View Post
        Slower than what? For "2D", software rendering is almost always faster than hw acceleration (EXA, UXA, and even SNA in a lot of cases). RENDER semantics map poorly to hw. Glamor is roughly on par with EXA today and makes it a lot easier to improve support in the future.
        Slower than either of UXA or SNA in this article for example: http://www.phoronix.com/scan.php?pag...y_glamor&num=4

        Comment


        • #34
          Originally posted by curaga View Post
          Slower than either of UXA or SNA in this article for example: http://www.phoronix.com/scan.php?pag...y_glamor&num=4
          You guys are talking about totally different things. agd5f is talking about software rendering (eg shadowfb) on SI which is what you get by default today, you're talking about Glamor on Intel hardware.

          Going forward, remember Chris W's comment that (paraphrasing a bit) Glamor is a good fit for big-ass GPU hardware (eg SI). Also note that some people are looking at initial patches for Glamor vs years of work on other architectures and extrapolating that "it's gonna be that way forever".
          Last edited by bridgman; 07-30-2012, 02:58 PM.

          Comment


          • #35
            Originally posted by bridgman View Post
            Going forward, remember Chris W's comment that (paraphrasing a bit) Glamor is a good fit for big-ass GPU hardware (eg SI).
            These:
            http://ickle.wordpress.com/2012/07/12/glamorous-radeon/ ?

            I count two wins for glamor and the rest are losses (vs Radeon EXA).

            Also note that some people are looking at initial patches for Glamor vs years of work on other architectures and extrapolating that "it's gonna be that way forever".
            I'm aware, and you should be too, but yet you said above "I guess I don't understand the pushback".

            It's entirely because of that. Users live in the now, saying that "in the future it will be bliss" is just talk, the "results now" show that glamor is worse in performance, and that it tears.

            Comment


            • #36
              Originally posted by curaga View Post
              These:
              http://ickle.wordpress.com/2012/07/12/glamorous-radeon/ ?

              I count two wins for glamor and the rest are losses (vs Radeon EXA).



              I'm aware, and you should be too, but yet you said above "I guess I don't understand the pushback".

              It's entirely because of that. Users live in the now, saying that "in the future it will be bliss" is just talk, the "results now" show that glamor is worse in performance, and that it tears.
              Software rendering is on par with or faster in more cases then any of the acceleration architectures. By that logic we should all be using shadowfb.

              Comment


              • #37
                Originally posted by curaga View Post
                It's entirely because of that. Users live in the now, saying that "in the future it will be bliss" is just talk, the "results now" show that glamor is worse in performance, and that it tears.
                Sure, but every architectural improvement sucked if you ran the early code, including SNA and EXA.

                I'm just seeing a lot more "extrapolating the future based on the initial code" and "proclaiming that extrapolation to be fact" than I ever saw in the past.

                Comment


                • #38
                  Originally posted by agd5f View Post
                  Software rendering is on par with or faster in more cases then any of the acceleration architectures. By that logic we should all be using shadowfb.
                  There are some use cases where software rendering is currently slower (primarily the fancy html5 browser demos). But it would be really nice if radeon DDX allowed to use non-crippled software rendering for 2D graphics (so that it is at least as fast as fbdev) as an option, while still providing the rest of the features expected from a modern graphics card (hardware accelerated OpenGL for 3D games, Xv extension for tear-free video, rotation, multi-monitor support, ...). Right now Option "RenderAccel" has some performance issues.

                  But it's not just performance. Reliability is also important. Do you remember a recent fallout of bugs in hardware accelerated RENDER implementations, which was just triggered by upgrading to recent cairo-1.12? Do you know that many linux distros are still patching cairo to disable the use of server side hardware accelerated gradients because the gradients are screwed up in the drivers even now? Software rendering is not totally bug free, but it is doing a lot better on the reliability front.

                  Comment


                  • #39
                    Originally posted by ssvb View Post
                    There are some use cases where software rendering is currently slower (primarily the fancy html5 browser demos). But it would be really nice if radeon DDX allowed to use non-crippled software rendering for 2D graphics (so that it is at least as fast as fbdev) as an option, while still providing the rest of the features expected from a modern graphics card (hardware accelerated OpenGL for 3D games, Xv extension for tear-free video, rotation, multi-monitor support, ...). Right now Option "RenderAccel" has some performance issues.

                    But it's not just performance. Reliability is also important. Do you remember a recent fallout of bugs in hardware accelerated RENDER implementations, which was just triggered by upgrading to recent cairo-1.12? Do you know that many linux distros are still patching cairo to disable the use of server side hardware accelerated gradients because the gradients are screwed up in the drivers even now? Software rendering is not totally bug free, but it is doing a lot better on the reliability front.
                    As I've mentioned several times on all of these threads, for good performance you really need to be all software or all hardware rendering. Any time you mix the two, performance suffers. That's why disabling RENDER accel is slower than plain shadowfb; you can use the GPU for copies and fills, but for everything else, the buffer must be migrated between CPU and GPU domains when you want to switch who renders. The same is true of trying to mix shadowfb and hw 3D rendering or Xv. If you have a shadowfb in system memory and the CPU is doing the rendering and then the GPU renders to an OpenGL or Xv buffer, you then have to deal with updating your shadowfb in system memory with the results of the 3D rendering in GPU memory. you still end up ping-ponging.

                    EXA does not even provide the infrastructure to accelerate gradients or trapezoids at the moment. They weren't used much previously because very few drivers (if any) implemented acceleration for them; hence they weren't tested much outside of sw rendering. Software rendering works because it's the reference implementation; that's how the features like gradients were added in the first place. Glamor has the infrastructure to support gradients and trapezoids already which is another reason it is attractive.

                    Comment


                    • #40
                      Originally posted by agd5f View Post
                      If you have a shadowfb in system memory and the CPU is doing the rendering and then the GPU renders to an OpenGL or Xv buffer, you then have to deal with updating your shadowfb in system memory with the results of the 3D rendering in GPU memory. you still end up ping-ponging.
                      Hmm, seems like this is really difficult for me to understand. Why would we even need to "update shadowfb in system memory with the results of the 3D rendering in GPU memory"? Doesn't it make a lot more sense to just set up something like periodic DMA transfers from shadowfb system memory to GPU memory (synchronized with screen refresh) and composite it with 3D or Xv buffers together on the GPU side before sending the combined pixel data to HDMI? Or is there some kind of hardware limitation here? Maybe I just need to check the hardware manuals first and not waste your time with silly questions.

                      Glamor has the infrastructure to support gradients and trapezoids already which is another reason it is attractive.
                      OK, in any case it's nice to have multiple alternatives. So you are of course free to try your luck with Glamor. With only one important condition: we need much better tools for automated correctness validation of RENDER implementations. Otherwise it is a real recipe for disaster

                      Comment


                      • #41
                        Originally posted by ssvb View Post
                        Hmm, seems like this is really difficult for me to understand. Why would we even need to "update shadowfb in system memory with the results of the 3D rendering in GPU memory"? Doesn't it make a lot more sense to just set up something like periodic DMA transfers from shadowfb system memory to GPU memory (synchronized with screen refresh) and composite it with 3D or Xv buffers together on the GPU side before sending the combined pixel data to HDMI? Or is there some kind of hardware limitation here? Maybe I just need to check the hardware manuals first and not waste your time with silly questions.
                        It depends on the a lot of factors. Basically, the way shadowfb works is that the master copy of the pixmaps live in system ram. They are stored in system ram and operated on by the CPU in system ram. Then periodically, the front buffer is updated with the results of the all the operations in system ram. It's a one way operation. The CPU copy is always written to the front buffer in vram. When the GPU renders to a buffer in vram (say Xv or OpenGL), the shadow copy no longer has the master copy. Say you want to alpha blend a 3D window with the desktop and some other windows, now you have to get the GPU rendered image into system ram so that the CPU can blend it with the other windows. But wait, can't you have the GPU do the alpha blend? Sure, but then you have to migrate all the CPU buffers that you want to blend into vram so the GPU can access them, so you still have to copy a lot of data around. Wait, can I store everything in gart and get the best of both worlds? Sure, but the problem there is that gart pages have to be pinned so the kernel doesn't swap them out while the GPU is accessing them. Since the kernel can't swap the pages, this limits the amount of memory available to the rest of the system. I think the kernel limits the amount of pinned memory to avoid a driver DOSing the system. With graphics buffers can be huge.

                        It's easier with a compositor. And you could even support the non-composited case by emulating a composited environment in the driver. In that case you could store the backing pixmaps wherever it makes the most sense (vram or system ram) and then use the GPU to composite the final image. For CPU rendered buffers you'd still need to migrate them to pinned memory for the GPU to access them, but you could keep the shadow front buffer in pinned memory. There are also some tricky corner cases to deal with. It's possible, but it basically comes down to writing a new acceleration architecture which would take time to write and mature. Before going down that road I think it makes sense to see what can be done with an existing acceleration architecture like glamor.

                        Comment


                        • #42
                          Originally posted by agd5f View Post
                          It's possible, but it basically comes down to writing a new acceleration architecture which would take time to write and mature. Before going down that road I think it makes sense to see what can be done with an existing acceleration architecture like glamor.
                          Must... create... new... architecture...

                          Comment


                          • #43
                            Originally posted by bridgman View Post
                            Must... create... new... architecture...
                            You miss the point EXA is an internal implementation detail which is not relevant for anyone but DDX writers. The users could not care less if it goes away, more likely they will not even notice.

                            Having multiple standards is bad. But a healthy competition between different implementations of the same standard is good.

                            Comment


                            • #44
                              I believe the intention was to do XV as it used to be, an overlay that can't be read back. Sure, no video on your spinning cube, but that wouldn't need any reads back to cpu.

                              So shadowfb + movies in an overlay that can't be read back or composited. Best of both worlds, accelerated color conversion/scaling, no tearing in movies, etc.

                              Comment


                              • #45
                                So now that it works with xorg-git I tested it. HD 6550M.

                                Code:
                                ~ % grep -i glamor /var/log/Xorg.0.log
                                [   295.584] (II) LoadModule: "glamoregl"
                                [   295.584] (II) Loading /usr/lib/xorg/modules/libglamoregl.so
                                [   295.587] (II) Module glamoregl: vendor="X.Org Foundation"
                                [   295.597] (**) RADEON(0): Option "AccelMethod" "glamor"
                                [   295.597] (II) Loading sub module "glamoregl"
                                [   295.597] (II) LoadModule: "glamoregl"
                                [   295.597] (II) Loading /usr/lib/xorg/modules/libglamoregl.so
                                [   295.597] (II) Module glamoregl: vendor="X.Org Foundation"
                                [   295.597] (II) glamor: OpenGL accelerated X.org driver based.
                                [   295.608] (II) glamor: EGL version 1.4 (DRI2):
                                [   295.633] (II) RADEON(0): glamor detected, initialising EGL layer.
                                [   296.112] (II) RADEON(0): Use GLAMOR acceleration.
                                It seems to be mostly fine but some things are excruciatingly slow. Not, the performance is not just good, but really slow. gtkperf needs extremely long for the GtkDrawingArea Line drawing test, causes high CPU usage and slowdowns in X.
                                I also tried ltris as a simple "real world" usage in a game and it also causes high cpu usage in X and the little animations are very slow.

                                Code:
                                GtkEntry - time:  0,05
                                GtkComboBox - time:  1,37
                                GtkComboBoxEntry - time:  1,03
                                GtkSpinButton - time:  0,10
                                GtkProgressBar - time:  0,07
                                GtkToggleButton - time:  0,10
                                GtkCheckButton - time:  0,07
                                GtkRadioButton - time:  0,09
                                GtkTextView - Add text - time:  0,20
                                GtkTextView - Scroll - time:  0,01
                                GtkDrawingArea - Lines - time: 224,47
                                GtkDrawingArea - Circles - time: 26,41
                                GtkDrawingArea - Text - time:  0,80
                                GtkDrawingArea - Pixbufs - time:  0,59
                                 --- 
                                Total time: 255,36
                                Even when completely done in software: 224 seconds for the line drawing? Is there something wrong or is this just the way it is for now?

                                edit: exa for comparison
                                Code:
                                GtkEntry - time:  0,04
                                GtkComboBox - time:  1,24
                                GtkComboBoxEntry - time:  0,90
                                GtkSpinButton - time:  0,11
                                GtkProgressBar - time:  0,07
                                GtkToggleButton - time:  0,11
                                GtkCheckButton - time:  0,05
                                GtkRadioButton - time:  0,09
                                GtkTextView - Add text - time:  0,19
                                GtkTextView - Scroll - time:  0,00
                                GtkDrawingArea - Lines - time:  1,41
                                GtkDrawingArea - Circles - time:  3,74
                                GtkDrawingArea - Text - time:  0,92
                                GtkDrawingArea - Pixbufs - time:  0,16
                                 --- 
                                Total time:  9,05
                                Last edited by ChrisXY; 08-08-2012, 08:34 AM.

                                Comment

                                Working...
                                X