Announcement

Collapse
No announcement yet.

Adobe's Linux Video API Rant Extended

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • KotH
    replied
    Originally posted by rohcQaH View Post
    isn't that just a simple per-pixel matrix multiplication, i.e. a trivial task for any shader-based GPU? If the YUV image is already in video memory, why can't it be converted there before moving it to GPU space for manual blending?
    Yes. That's what video player do. Actually, they do a little bit more. They do the 4:2:0/4:2:2 to 4:4:4 upsampling and YUV->RGB in hardware on the card (virutally all video cards produced in the last 10a support this).

    But for flash it does not work as it needs the RGB data in system memory for further processing. Reading back from the graphics card would work, but it is slower then converting it in software on the CPU. (ofcourse only if you do not do the whole decoding process in hardware already).

    Leave a comment:


  • unix_epoch
    replied
    [This was originally an edit to my previous post, but apparently taking longer than 60 seconds to type an edit is disallowed by the forum sofftware]

    I haven't done much in graphics lately, so how much bandwidth is actually available these days, either using DMA or a straight memcpy (last time I was heavily involved in graphics, my desktop PC was barely able to push 1024x768x16bpp@30fps from the CPU to GPU, not using DMA)?

    Leave a comment:


  • unix_epoch
    replied
    Originally posted by gbeauche View Post
    Is Flash acceleration on Windows also really reading the pixels back? In my experience, this is slow on Linux on both AMD and NVIDIA, and even more painful on GMA500. How could this be way faster on Windows if the underlying driver code is common? It seems latest Flash and GMA500 drivers work well together, and considering that reading pixels back on this platform is unpractical, I strongly believe that even the Windows Flash is using overlays and not the pixels readback solution. Or are they using a Direct3D renderer?
    I have to wonder this same question. 1080p30 video would require close to 100MB/s in YV12, or 200MB/s in RGB, both directions. In my mind that's a hefty chunk of the available CPU-GPU bandwidth, considering all the other things going on with a typical system.

    Leave a comment:


  • gbeauche
    replied
    Originally posted by bridgman View Post
    AFAICS Mike is saying that they need APIs which will allow player code to sit between the decode and render operations. As Gwenole pointed out, both the ATI and NVidia implementations have a mechanism to do this (not sure about VA-API)
    Yes, VA-API has vaGetImage() for this. And this actually is what VLC is using. The GMA500 driver "psb" implements this function, but not the "iegd" driver. There is feature request open for it though.

    However, I can draw the following capabilities:
    - psb: read back as NV12
    - nvidia: read back as YV12, NV12, possibly RGBA with another indirection through a VdpOutputSurface
    - fglrx: read back as YV12 only. RGBA could be possible, but a bug currently prevents this from working. There could be a workaround through a GL download but I don't really want to maintain workarounds.

    Is Flash acceleration on Windows also really reading the pixels back? In my experience, this is slow on Linux on both AMD and NVIDIA, and even more painful on GMA500. How could this be way faster on Windows if the underlying driver code is common? It seems latest Flash and GMA500 drivers work well together, and considering that reading pixels back on this platform is unpractical, I strongly believe that even the Windows Flash is using overlays and not the pixels readback solution. Or are they using a Direct3D renderer?

    Leave a comment:


  • unix_epoch
    replied
    Originally posted by mirza View Post
    Colorspace Conversion: Isn't it simple to have 50MB of RAM reserved for YUV -> RGB conversion lookup table, for users that have no HW acceleration, but have plenty (well, 50MB) of RAM? This table can be shared in RAM for all running video encoding/decoding instances. In such table you can simply define RGB for each YUV combination and have image converted in no time.
    Caching would be an issue. Every time you hit the table you would incur a cache miss. If you can combine a lut with a smaller calculation, that would fit the table and code in cache, then you might have something, but at that point you might as well just use the optimized SSE code mplayer uses.

    There's also no reason for flash to be bringing the image back to the CPU for display unless the flash animation uses alpha blending with the browser contents, which only those annoying pop-over ads will do, not YouTube or Hulu.

    Leave a comment:


  • Licaon
    replied
    Originally posted by DanL View Post
    When "pesky facts" arise, be sure to paint your opponents as freetards or even better, hippies.
    or communists

    Leave a comment:


  • Licaon
    replied
    my comment on the Adobe penguin blog still awaits moderation for the last 12 hours or so.

    i've just asked why Adobe did not ask about/for those needed API changes in the nVidia Linux drivers while their teams worked on the Windows Flash 10.1 and 19x.xx integration.

    i find it hard to believe that while working at that no one said "hey guys maybe we/you can do that on Linux & Mac too since our/your driver core shares 90% or whatever of the code"

    Leave a comment:


  • mirza
    replied
    Colorspace Conversion: Isn't it simple to have 50MB of RAM reserved for YUV -> RGB conversion lookup table, for users that have no HW acceleration, but have plenty (well, 50MB) of RAM? This table can be shared in RAM for all running video encoding/decoding instances. In such table you can simply define RGB for each YUV combination and have image converted in no time.

    Scaling: How is scaling a problem? Did Second Reality demo scaled a bold guy's face in full screen on 486 back in Ronald Reagan era? Yes, it also used look-up table, you don't have to calculate everything all the time. For video scaled to 1920x1080 screen (size of source image is not important), you need only 8MB lookup table. Again, conversion is done in no time.

    Can somebody enlighten me why things that were simple in late eighties are complicated and _slow_ again on more then 100x times faster CPUs?

    Leave a comment:


  • intgr
    replied
    Originally posted by jdeslip View Post
    this is the most unprofessional article I have ever read on Phoronix.
    You must be new here...

    Leave a comment:


  • bridgman
    replied
    It's cheap on a GPU if the YCbCr image is in video memory and you want the result in video memory as well, fairly expensive on a CPU, and moderately expensive on a GPU if you want the result back in system memory so the CPU can chew on it some more.

    I don't know what the specific requirements are for Flash, ie whether the decoded image needs to be further processed on the CPU or just via GPU operations using GL. I suspect the requirement is for CPU processing, since AFAIK post-decode processing using GL is already being done by Adobe and other players.

    Leave a comment:

Working...
X