Announcement

Collapse
No announcement yet.

There May Still Be Hope For R600g Supporting XvMC, VDPAU

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Bingo.

    Many systems don't have *quite* enough CPU power to play the videos that users want to watch, so offloading even part of the decode work to the GPU can make a big difference in user experience. There are also formats which hardware acceleration won't handle today (VP8 is an obvious example but I'm sure there will be more in the future) so a general purpose, easily extensible solution will be useful in other ways as well.

    You might still be nuts but not in this specific case

    Comment


    • #17
      There is also the obvious point that if we are not able to open up UVD programming information then shader-assisted decode is going to be a nice thing to have even if it only implements prediction/MC and filtering.

      Comment


      • #18
        Originally posted by bridgman
        ...if we are not able to open up UVD programming information...
        Yes, I wanted to mention that you never said that you won't as Michael wrote in the article. You wrote in the other forum that you will look at it and in 6 months you will be able to tell if it will ever happen or not.

        So, for example the vdpau state tracker would eat the vdpau calls from mplayer and turn them into tgsi intsructions* then the r600g driver would eventually display it.

        *: Well, I think tgsi will be turned back to glsl ir by the driver first, then it would get optimised and fed back to the driver to actually display something. Right?

        Originally posted by bridgman View Post
        You might still be nuts but not in this specific case
        Thanks. :P

        Comment


        • #19
          Let's assume we have a working vdpau state tracker with r600g.
          Now how could we add vp8 decoding possibility to this?


          1) The shader based version which was started eons ago needs to be done for all formats, right?
          2) An opencl based solution would require a working opencl state tracker and the ffmpeg guys to write decoders for each codec.
          3) The vdpau state tracker needs to be written only once and then we've got to wait for nvidia devs to implement support new formats. Maybe some modifications are needed afterwards.

          Assuming the above points are correct the 3rd option requires the least amount of xorg work I think. Plus it would work immediately.
          The 1st option seems pointless.
          The 2nd option is interesting, because ffmpeg devs are interested in that and that state tracker could be used for several other projects, too.

          So I am actually confused; why haven't the xorg devs written a proper opencl state tracker with good driver support and said that:
          "Write opencl code and don't ever mention again uvd/xvba/vdpau/vaapi!"?
          It seems much easier to me and the end result might as well be better than with any other option.

          Conclusion:
          The fastest solution is what is in the article, the "vdpau state tracker for r600g".
          The best would be possibly the opencl one but there is no such decoder at the moment so it wouldn't solve the situation right now.


          One more thing:
          So the decoder is now written in c/assembly which runs on the cpu or gpu (through vdpau/xvba/vaapi).
          If it was written in opencl then it could run on the cpu, the gpu or both. In this case vdpau/xvba/vaapi won't get touched but the drivers' opencl capabilities will be used.

          Please correct me if I am wrong somewhere!

          Comment


          • #20
            Originally posted by V!NCENT View Post
            Hello world. CPU's can playback video. CPU's can't do 3D graphics. Why do you think people buy GPU's?

            Is it just me or is the world nuts? Okay...
            CPU's actually can do 3D graphics, but they suck at it. We already had 3D games before the everybody started installing Voodoo cards.

            As said, videos with a high bitrate are hard to decode using just the CPU. Also, the GPU can often do it using far less power. Don't be surprised if GPU decoding uses 30 watts less compared to having the CPU do the same thing. Apart from the slightly lower electricity bill this mostly reduces fan noise and, in case of laptops, improves battery time.

            So yeah, call me nuts..

            Comment


            • #21
              Originally posted by HokTar View Post
              Yes, I wanted to mention that you never said that you won't as Michael wrote in the article. You wrote in the other forum that you will look at it and in 6 months you will be able to tell if it will ever happen or not.
              Right... although 6 months is a rough guess, not a hard plan.

              Originally posted by HokTar View Post
              So, for example the vdpau state tracker would eat the vdpau calls from mplayer and turn them into tgsi intsructions* then the r600g driver would eventually display it.
              Actually a combination of TGSI instructions for the shader programs and Gallium3D API calls for everything else (including "turn this TGSI into something the GPU can use"). The shader programs will mostly do filtering (for MC and deblocking) and other API calls will invoke those filters on a bunch of squares/rectangles as part of the higher level decoding process.

              Originally posted by HokTar View Post
              *: Well, I think tgsi will be turned back to glsl ir by the driver first, then it would get optimised and fed back to the driver to actually display something. Right?
              GLSL IR happens in the upper level mesa; a Gallium3D state tracker comes in below mesa (ie mesa is one of many state trackers). The normal GL flow (greatly simplified) is :

              Classic GL driver
              =================

              GLSL
              compiler generats GLSL IR
              utility routine converts to Mesa IR
              --------
              Mesa IR
              driver converts to hardware

              (dashed line is the interface between common mesa and HW driver)

              Gallium3D GL driver
              ===================

              GLSL
              compiler generates GLSL IR
              utility routine converts to Mesa IR
              utility routine converts to TGSI
              ----
              TGSI
              driver converts to hardware

              (Mesa IR is converted to TGSI in the common mesa code then passed to the Gallium3D HW driver)

              Video state tracker implementing full H.264 decode
              ==================================================

              H.264 slice
              CPU does bitstream parse, entropy decode, probably IDCT
              tracker generates TGSI for MC, deblock
              ----
              TGSI
              driver converts to hardware

              Video state tracker implementing MC/deblock only
              ================================================

              partially decoded surfaces
              tracker generates TGSI for MC, deblock
              ----
              TGSI
              driver converts to hardware

              The potentially confusing thing about nearly all of the video interfaces is that their definitions include multiple entry points -- bitstream/slice, XvMC-like, and (IIRC) Xv-like. Drivers typically implement only one of those entry points and the player needs to do the rest (which means that a player that "supports VA-API" might only support one of the entry points while a "VA-API driver" might only support a different entry point.

              Most of the driver implementations so far have been for use with hardware decoders and have implemented the top level entry point, but things will get more interesting as other entry points start to get used more. I expect the most common stack for shader-assisted decode will be whichever of the lower level entry points for an existing API (VA-API or VDPAU) best aligns with the MC/deblock functions, and if none of them fit then an enhanced XvMC would probably be used instead.

              Originally posted by HokTar View Post
              Let's assume we have a working vdpau state tracker with r600g. Now how could we add vp8 decoding possibility to this?

              1) The shader based version which was started eons ago needs to be done for all formats, right?
              2) An opencl based solution would require a working opencl state tracker and the ffmpeg guys to write decoders for each codec.
              3) The vdpau state tracker needs to be written only once and then we've got to wait for nvidia devs to implement support new formats. Maybe some modifications are needed afterwards.
              IIRC the VDPAU API also supports multiple entry points, but I don't remember if it has something corresponding to MC/deblock. I *think* VA-API is a better fit but not 100% sure.

              What I see happening is a state tracker that exposes a few standard functions (MC, deblock) which can be used by a higher level decode routine (eg ffmpeg). The existing ffmpeg codecs would be modified to call into the generic MC/deblock state tracker and perform those functions on the GPU rather than the CPU, so the additional per-format work should be relatively small.

              Originally posted by HokTar View Post
              So I am actually confused; why haven't the xorg devs written a proper opencl state tracker with good driver support and said that:
              "Write opencl code and don't ever mention again uvd/xvba/vdpau/vaapi!"?
              It seems much easier to me and the end result might as well be better than with any other option.
              The same reason they haven't cured cancer or done something about the common cold. It's a big honkin' task and since the plan is to build OpenCL over Gallium3D it makes sense to get the Gallium3D drivers fully implemented first.

              Originally posted by HokTar View Post
              Conclusion:
              The fastest solution is what is in the article, the "vdpau state tracker for r600g".
              The best would be possibly the opencl one but there is no such decoder at the moment so it wouldn't solve the situation right now.
              I'm not convinced that OpenCL is actually the best option. IMO the Gallium3D API (via a state tracker to isolate decoders from changes in the Gallium3D API) is a better fit, since Gallium3D makes it easier to use the GPUs texture filtering resources along with the ALUs.

              Originally posted by HokTar View Post
              One more thing:
              So the decoder is now written in c/assembly which runs on the cpu or gpu (through vdpau/xvba/vaapi).
              If it was written in opencl then it could run on the cpu, the gpu or both. In this case vdpau/xvba/vaapi won't get touched but the drivers' opencl capabilities will be used.

              Please correct me if I am wrong somewhere!
              I guess the short answer is that I think the shader-assisted decoders will be more like graphics work than compute work, and so something running directly over Gallium3D is likely to be both the shortest and most efficient path.

              Comment


              • #22
                BTW in case you are wondering why all that conversion stuff is happening, the Intel devs (Ian et al) are looking at one more step :

                Classic GL driver (Intel proposal as I understand it)
                =================

                GLSL
                compiler generats GLSL IR
                --------
                GLSL IR
                driver converts to hardware

                In other words, GLSL IR would replace Mesa IR as the standard interface between common Mesa code and hardware-specific drivers. It seems like a reasonable idea, although I don't know if anyone has had time to really look at the impact on HW drivers and on TGSI.

                Since GLSL IR => Mesa IR => HW and GLSL IR => Mesa IR => TGSI => HW seem to be working OK so far it seems like eliminating the Mesa IR step should certainly work... in the worst case the utility routine for converting from GLSL IR to Mesa IR would have to become part of each HW driver rather than part of the common Mesa code.

                There would be a change required for Gallium3D as well :

                GLSL
                compiler generats GLSL IR
                utility routine converts to TGSI (hopefully not going through Mesa IR on the way )
                --------
                TGSI
                driver converts to hardware

                Comment


                • #23
                  Thanks for the insight! It really shed some light on these matters!

                  Is it not the plan right now to get rid of mesa ir and tgsi and create an llmv ir instead? I checked the thread on mesa-dev but there is no conclusion so far.

                  Comment


                  • #24
                    I think everyone is trying to finish off the last big architectural upheaval before thinking too hard about the next one

                    AFAIK there are three things we don't know yet :

                    1. Whether developers feel that a "flat" and "generic" shader program representation like TGSI is still felt to be useful after working with Gallium3D and TGSI for a year or so... if TGSI is still felt to be the best interface to HW drivers then the LunarG proposal would affect Mesa internals but would not really affect the drivers. At first glance it seems that having the ability to tailor the lower LLVM IR to match the target hardware would imply at least some extensions to TGSI but I don't know for sure.

                    2. Whether the proposed LLVM optimizations and conversions are felt to be significantly useful for the Intel drivers. I think I remember Ian saying that he felt that a lot of optimization had to be aware of the target hardware, which means that the LLVM-based optimizations could be useful

                    3. Whether the key premise of the LunarGlass proposal is valid, ie whether "the optimizations available in the LLVM ecosystem" can more or less "just work" with the graphics-oriented LLVM IR extensions or whether a lot of the optimization work will need to be written from scratch anyways, raising the question of whether hw-specific LLVM IR is much better than GLSL IR or than some other HW-specific IR.

                    There are probably other things we don't know but I don't even know what they are.

                    Comment


                    • #25
                      OK so last gen CPU's and below suck at ripped video playback. Fair enough. I jumped from a 32bit Athlon 2800 XP+ to an Phenom 9950 X4 so I didn't know it was only one of the first CPUs to leverage full hd playback.

                      About that LLVM IR... please not before succesful TGSL. But on the other hand; if no LLVM IR right now then we will run into useless TGSL state trackers on the long run, right? But on the other hand; while FLOSS drivers may work for me and others, thy are still not anywhere near fully featured blob replacements, so you might as well invest further into the future...

                      Comment


                      • #26
                        Staying with TGSI for a bit too long isn't likely to be a problem since it should be pretty easy to convert almost any future IR into TGSI.

                        The big open question is which approach will allow development to proceed most quickly. I don't think anyone knows the answer to that one yet.

                        Comment


                        • #27
                          Originally posted by bridgman View Post
                          IIRC the VDPAU API also supports multiple entry points, but I don't remember if it has something corresponding to MC/deblock. I *think* VA-API is a better fit but not 100% sure.
                          No, VDPAU is VLD only. VDPAU might evolve in some more detailed like VA-API (slice level bitstream info), for better error recovery and checks, but nothing set in stone yet. And nothing else than VLD anyway. Actually, VDPAU was designed to be VLD only so that user applications don't need to be modified much. Which, in practise, they don't need to, even with MC/IDCT entry-points, assuming they use a common decoding library. e.g. FFmpeg.

                          Comment


                          • #28
                            Originally posted by Kano View Post
                            That's maybe 99.9% sure...
                            You forget that HD 6000 series chips have UVD3. So you have no guarantee that the exact same workarounds are needed. Actually, this isn't the same workarounds on my "something else that is not HD 6000 but has UVD3". I just can't assume anything without someone trying for real there.

                            Comment


                            • #29
                              @ bridgman & gbeauche
                              So what is the conclusion? Would va-api be a better choice as our "shiny new state tracker"? Or the differences are minor so it could be the individual developer's call?

                              (It will be anyways, but you know, in theory.)

                              Comment


                              • #30
                                Anyone know where to contact König? I think it would be sad creating a lot of duplicate work. We are currently tree developers looking into a shader based decoder via gallium3d.

                                Comment

                                Working...
                                X