Announcement

Collapse
No announcement yet.

A few questions about video decode acceleration

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • TechMage89
    started a topic A few questions about video decode acceleration

    A few questions about video decode acceleration

    I have a few questions about video decode acceleration:

    1.) I understand that the Avivo video processor's specs can be released without legal issues, but UVD's specs cannot because of output protection stuff. Does this mean that R600+ chips will not be able to have any open-source video decode support, or that their support will be limited to the avivo functions? In either case, I'd like to bring up the suggestion again of a blob for video decoding that the open-source driver could tie into.

    2.) Bridgman, have the people at amd begun to rummage for video decode spec documents? Do they exist? When do you expect you'll have time to start cleaning them up for release?

    3.) This question is more toward the driver developers. What api do you think will be useful for video decode. Is VAAPI the future? Or will extensions to Xvmc work?

  • _txf_
    replied
    For those wondering what what is possibly available for hardware acceleration should see this link on the xbmc wiki.

    It is really complete, nothing too detailed about implementations but otherwise a good collection of techs.

    Leave a comment:


  • bridgman
    replied
    OpenGL can only handle the "render" part of the video pipeline (colour space conversion, scaling etc..) not the "decode" part (IDCT, MC etc..). Strictly speaking I guess you could do crude MC in OpenGL (some GPU settings would be wrong) but don't know if anyone has ever tried it.

    Leave a comment:


  • Uber
    replied
    Originally posted by TechMage89 View Post
    No, the GPU won't accelerate the way multithreading does. The GPU doesn't split slices with the CPU, but instead accelerates part of the pipeline. It usually works like this:

    stream decode -> IDCT -> MC -> post process

    With basic harware acceleration, the CPU does stream decode and IDCT, then sends the resulting stream to the GPU, which does MC (usually the most computationally intensives step) and post-processing (like deinterlacing.) That way, not only does the CPU only have to do the lightweight lifting, but the video stream passed to the GPU is still partially compressed, meaning it uses less of the (valuable) bus bandwidth. More advanced harware acceleration, like what UVD offers, basically accelerates the whole pipeline, meaning the CPU doesn't really have to do anything at all.

    If you want to accelerate encoding with the GPU, that's trickier, but possible, given the right hardware, software, and setup. To do this effectively, you'd probably need hardware (like UVD) that accelerates the whole pipeline, or else the video bus would be clogged by large amounts of data travelling back and forth from CPU to GPU.
    What are the limitation of using OpenGL ?

    (My point is to make sure new video acc API is not bound by vendor or fancy 3 letters that's hidden by the content mafia anyway.)

    Leave a comment:


  • bridgman
    replied
    Right. The GPU instruction set is completely different from the CPU instruction set, so you can't just gently slide work from one to the other. GPUs are massively parallel (an HD48xx can do 800 multiply-add ALU operations per clock in the main shader core while a quad-core CPU can do maybe 4 ALU ops per clock normally or 8-16 per clock using SSE instructions) but the instructions are different, clocks are lower, and the effective IPC rate is a bit lower.

    On the other hand, GPUs include hardware to spread work across mulitiple processors and collect the results, which makes programming easier for a specific class of problems (the "stream programming" paradigm).

    The interesting thing about doing the entire encode/decode task on the shader core is that it is relatively portable across most modern GPUs, although there are a number of APIs at the same level to consider -- CAL, CUDA, Gallium, OpenCL and DX11 Compute Shaders come to mind immediately.

    Leave a comment:


  • TechMage89
    replied
    No, the GPU won't accelerate the way multithreading does. The GPU doesn't split slices with the CPU, but instead accelerates part of the pipeline. It usually works like this:

    stream decode -> IDCT -> MC -> post process

    With basic harware acceleration, the CPU does stream decode and IDCT, then sends the resulting stream to the GPU, which does MC (usually the most computationally intensives step) and post-processing (like deinterlacing.) That way, not only does the CPU only have to do the lightweight lifting, but the video stream passed to the GPU is still partially compressed, meaning it uses less of the (valuable) bus bandwidth. More advanced harware acceleration, like what UVD offers, basically accelerates the whole pipeline, meaning the CPU doesn't really have to do anything at all.

    If you want to accelerate encoding with the GPU, that's trickier, but possible, given the right hardware, software, and setup. To do this effectively, you'd probably need hardware (like UVD) that accelerates the whole pipeline, or else the video bus would be clogged by large amounts of data travelling back and forth from CPU to GPU.

    Leave a comment:


  • Uber
    replied
    Originally posted by val-gaav View Post
    So if I'm doing some H.264 encoding with mencoder would this CAL thing help me to speed up the process ? I'm using a Turion laptop with radeon x1200. Is there any way to try it right now or is it a future thing (and will it work with mencoder)?

    Mencoder (and x264) has a threads mechanism ... For example I'm using threads=2 to utilize both cores of my Turion. Would it be possible to speed up the encoding even further by using both the GPU and CPU (with threads=4, and treating the GPU as another core)? I have some doubts that radeon x1200 would be a lot faster then the CPU since it's not a high end GPU.
    I found this when I research the subject myself.

    The multithreading support for h264 will only work if the h264 stream
    was encoded with slices enabled. The multithreading code works by
    sending each slice off to a different thread to be decoded rather than
    a threaded pipelined approach. Recent builds of the x264 encoder
    don't use multiple slices by default any longer so it's quite possible
    that your file only has once slice and will only be decoded by one
    thread.

    Leave a comment:


  • Uber
    replied
    Originally posted by bridgman View Post
    Alex, there are no "secret agreements not to expose certain HW functionality". There are "non-secret" agreements that if we offer API support for certified media players we will ensure a certain level of robustness for the associated protection mechanisms. There is also the "non-secret" reality that if we don't offer API support for certified players then we can't sell our chips to major OEMs, which would be spectacularly bad for business.

    If we can find ways to expose HW acceleration for open source driver development without putting the implementations on other OSes at risk then that is fine. Right now I am reasonably sure we will be able to do this for the IDCT/MC hardware but not so sure about UVD yet so am saying "no unless you hear otherwise".

    Until we have 6xx/7xx 3d engine support up and running this is all academic since the first requirement is getting the back end (render) acceleration in place and working well.
    As an enduser I rather have GPU generic API, than HW spesific implementations.

    1. Currect Purevideo/AVIO implemetation are quite picky about format, bitrate and resolution and what codec that's supported.

    I did quite some testing in windows land, and 20% of my encode's did not play well.

    AIK they very specific implemented for HDDVD/Bluray playback, and not an generic video codec API that we need.

    2. DRM

    We need to focus on using open and generic functionality to ensure we have full controll. This enable us to add filters and other API to the postprosessing. (Upscale, sharpen, deblock etc)

    Even if we lose some neat HW based quality improvement we should try reimplement it with an generic GPU angle.

    So I'm not so hungry for the core UVD, I rather like new ideas and implementations that an GPU and codec independent.

    Just my thoughts as an HTPC junkie.

    Leave a comment:


  • val-gaav
    replied
    Originally posted by bridgman View Post
    chaos386, that will be no problem and all the info required is available today, either using CAL, or Gallium for the open source world, or by having the drivers feed shader commands into the 3d core like they do for Xv and EXA Render.

    What CAL brings is the ability to make use of the shader processors *without* having to have the specialized knowledge of a driver developer. I believe we are demo-ing video encode/transcode using CAL already. Video encoding and transcoding are particularly attractive for execution on the shaders (via CAL or...) because encoding requires relatively more floating point work than decoding (for things like motion estimation).
    So if I'm doing some H.264 encoding with mencoder would this CAL thing help me to speed up the process ? I'm using a Turion laptop with radeon x1200. Is there any way to try it right now or is it a future thing (and will it work with mencoder)?

    Mencoder (and x264) has a threads mechanism ... For example I'm using threads=2 to utilize both cores of my Turion. Would it be possible to speed up the encoding even further by using both the GPU and CPU (with threads=4, and treating the GPU as another core)? I have some doubts that radeon x1200 would be a lot faster then the CPU since it's not a high end GPU.
    Last edited by val-gaav; 07-16-2008, 02:53 PM.

    Leave a comment:


  • mtippett
    replied
    Originally posted by glisse View Post
    This why there are discussion (i think i heard things ) about adding special infrastructure in pipe to be able to take advantage of such hw instead of trying to do it in shader. I am convinced that for decode gpu is not the best solution, dedicated hw is. Shader will be the default safe path, i think...
    Sounds CPU <- GPU-Shader <- GPU-dedicated fallback... All very DXVA-ish

    Shader based will probably fine for most hardware. We'll see how the GSoC project goes.

    Regards,

    Matthew
    Last edited by mtippett; 07-16-2008, 03:36 PM.

    Leave a comment:

Working...
X