Announcement

Collapse
No announcement yet.

A few questions about video decode acceleration

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #81
    Alex, there are no "secret agreements not to expose certain HW functionality". There are "non-secret" agreements that if we offer API support for certified media players we will ensure a certain level of robustness for the associated protection mechanisms. There is also the "non-secret" reality that if we don't offer API support for certified players then we can't sell our chips to major OEMs, which would be spectacularly bad for business.

    If we can find ways to expose HW acceleration for open source driver development without putting the implementations on other OSes at risk then that is fine. Right now I am reasonably sure we will be able to do this for the IDCT/MC hardware but not so sure about UVD yet so am saying "no unless you hear otherwise".

    Until we have 6xx/7xx 3d engine support up and running this is all academic since the first requirement is getting the back end (render) acceleration in place and working well.
    Test signature

    Comment


    • #82
      Bridgman, how much has your acceleration block changed over generations? Would it be safe for your current DRM to tell how to use R200/R100 video decoding blocks?

      Comment


      • #83
        Not much really; synchronization between the IDCT and MC functions changed in R300, rest of the changes were pretty minor. I expect the info we release will enable right back to RV100 aka Radeon 7000.

        The biggest demand for IDCT/MC is still coming from 7000 owners and embedded HW designers who used 7000, which makes sense I guess.

        It's really just the IDCT block that still needs docs; MC just uses special modes in the 3d engine and that info is already out for 5xx. AFAIK the XvMC API supports MC-only acceleration so someone could start on that now if they had time. MC is still the most computationally expensive stage in the pipe, or at least it is for MPEG2.
        Last edited by bridgman; 16 July 2008, 12:46 PM.
        Test signature

        Comment


        • #84
          Originally posted by glisse View Post
          This why there are discussion (i think i heard things ) about adding special infrastructure in pipe to be able to take advantage of such hw instead of trying to do it in shader. I am convinced that for decode gpu is not the best solution, dedicated hw is. Shader will be the default safe path, i think...
          Sounds CPU <- GPU-Shader <- GPU-dedicated fallback... All very DXVA-ish

          Shader based will probably fine for most hardware. We'll see how the GSoC project goes.

          Regards,

          Matthew
          Last edited by mtippett; 16 July 2008, 03:36 PM.

          Comment


          • #85
            Originally posted by bridgman View Post
            chaos386, that will be no problem and all the info required is available today, either using CAL, or Gallium for the open source world, or by having the drivers feed shader commands into the 3d core like they do for Xv and EXA Render.

            What CAL brings is the ability to make use of the shader processors *without* having to have the specialized knowledge of a driver developer. I believe we are demo-ing video encode/transcode using CAL already. Video encoding and transcoding are particularly attractive for execution on the shaders (via CAL or...) because encoding requires relatively more floating point work than decoding (for things like motion estimation).
            So if I'm doing some H.264 encoding with mencoder would this CAL thing help me to speed up the process ? I'm using a Turion laptop with radeon x1200. Is there any way to try it right now or is it a future thing (and will it work with mencoder)?

            Mencoder (and x264) has a threads mechanism ... For example I'm using threads=2 to utilize both cores of my Turion. Would it be possible to speed up the encoding even further by using both the GPU and CPU (with threads=4, and treating the GPU as another core)? I have some doubts that radeon x1200 would be a lot faster then the CPU since it's not a high end GPU.
            Last edited by val-gaav; 16 July 2008, 02:53 PM.

            Comment


            • #86
              Originally posted by bridgman View Post
              Alex, there are no "secret agreements not to expose certain HW functionality". There are "non-secret" agreements that if we offer API support for certified media players we will ensure a certain level of robustness for the associated protection mechanisms. There is also the "non-secret" reality that if we don't offer API support for certified players then we can't sell our chips to major OEMs, which would be spectacularly bad for business.

              If we can find ways to expose HW acceleration for open source driver development without putting the implementations on other OSes at risk then that is fine. Right now I am reasonably sure we will be able to do this for the IDCT/MC hardware but not so sure about UVD yet so am saying "no unless you hear otherwise".

              Until we have 6xx/7xx 3d engine support up and running this is all academic since the first requirement is getting the back end (render) acceleration in place and working well.
              As an enduser I rather have GPU generic API, than HW spesific implementations.

              1. Currect Purevideo/AVIO implemetation are quite picky about format, bitrate and resolution and what codec that's supported.

              I did quite some testing in windows land, and 20% of my encode's did not play well.

              AIK they very specific implemented for HDDVD/Bluray playback, and not an generic video codec API that we need.

              2. DRM

              We need to focus on using open and generic functionality to ensure we have full controll. This enable us to add filters and other API to the postprosessing. (Upscale, sharpen, deblock etc)

              Even if we lose some neat HW based quality improvement we should try reimplement it with an generic GPU angle.

              So I'm not so hungry for the core UVD, I rather like new ideas and implementations that an GPU and codec independent.

              Just my thoughts as an HTPC junkie.

              Comment


              • #87
                Originally posted by val-gaav View Post
                So if I'm doing some H.264 encoding with mencoder would this CAL thing help me to speed up the process ? I'm using a Turion laptop with radeon x1200. Is there any way to try it right now or is it a future thing (and will it work with mencoder)?

                Mencoder (and x264) has a threads mechanism ... For example I'm using threads=2 to utilize both cores of my Turion. Would it be possible to speed up the encoding even further by using both the GPU and CPU (with threads=4, and treating the GPU as another core)? I have some doubts that radeon x1200 would be a lot faster then the CPU since it's not a high end GPU.
                I found this when I research the subject myself.

                The multithreading support for h264 will only work if the h264 stream
                was encoded with slices enabled. The multithreading code works by
                sending each slice off to a different thread to be decoded rather than
                a threaded pipelined approach. Recent builds of the x264 encoder
                don't use multiple slices by default any longer so it's quite possible
                that your file only has once slice and will only be decoded by one
                thread.

                Comment


                • #88
                  No, the GPU won't accelerate the way multithreading does. The GPU doesn't split slices with the CPU, but instead accelerates part of the pipeline. It usually works like this:

                  stream decode -> IDCT -> MC -> post process

                  With basic harware acceleration, the CPU does stream decode and IDCT, then sends the resulting stream to the GPU, which does MC (usually the most computationally intensives step) and post-processing (like deinterlacing.) That way, not only does the CPU only have to do the lightweight lifting, but the video stream passed to the GPU is still partially compressed, meaning it uses less of the (valuable) bus bandwidth. More advanced harware acceleration, like what UVD offers, basically accelerates the whole pipeline, meaning the CPU doesn't really have to do anything at all.

                  If you want to accelerate encoding with the GPU, that's trickier, but possible, given the right hardware, software, and setup. To do this effectively, you'd probably need hardware (like UVD) that accelerates the whole pipeline, or else the video bus would be clogged by large amounts of data travelling back and forth from CPU to GPU.

                  Comment


                  • #89
                    Right. The GPU instruction set is completely different from the CPU instruction set, so you can't just gently slide work from one to the other. GPUs are massively parallel (an HD48xx can do 800 multiply-add ALU operations per clock in the main shader core while a quad-core CPU can do maybe 4 ALU ops per clock normally or 8-16 per clock using SSE instructions) but the instructions are different, clocks are lower, and the effective IPC rate is a bit lower.

                    On the other hand, GPUs include hardware to spread work across mulitiple processors and collect the results, which makes programming easier for a specific class of problems (the "stream programming" paradigm).

                    The interesting thing about doing the entire encode/decode task on the shader core is that it is relatively portable across most modern GPUs, although there are a number of APIs at the same level to consider -- CAL, CUDA, Gallium, OpenCL and DX11 Compute Shaders come to mind immediately.
                    Test signature

                    Comment


                    • #90
                      Originally posted by TechMage89 View Post
                      No, the GPU won't accelerate the way multithreading does. The GPU doesn't split slices with the CPU, but instead accelerates part of the pipeline. It usually works like this:

                      stream decode -> IDCT -> MC -> post process

                      With basic harware acceleration, the CPU does stream decode and IDCT, then sends the resulting stream to the GPU, which does MC (usually the most computationally intensives step) and post-processing (like deinterlacing.) That way, not only does the CPU only have to do the lightweight lifting, but the video stream passed to the GPU is still partially compressed, meaning it uses less of the (valuable) bus bandwidth. More advanced harware acceleration, like what UVD offers, basically accelerates the whole pipeline, meaning the CPU doesn't really have to do anything at all.

                      If you want to accelerate encoding with the GPU, that's trickier, but possible, given the right hardware, software, and setup. To do this effectively, you'd probably need hardware (like UVD) that accelerates the whole pipeline, or else the video bus would be clogged by large amounts of data travelling back and forth from CPU to GPU.
                      What are the limitation of using OpenGL ?

                      (My point is to make sure new video acc API is not bound by vendor or fancy 3 letters that's hidden by the content mafia anyway.)

                      Comment

                      Working...
                      X