Announcement

Collapse
No announcement yet.

Adobe's Linux Video API Rant Extended

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by mirza View Post
    Colorspace Conversion: Isn't it simple to have 50MB of RAM reserved for YUV -> RGB conversion lookup table, for users that have no HW acceleration, but have plenty (well, 50MB) of RAM? This table can be shared in RAM for all running video encoding/decoding instances. In such table you can simply define RGB for each YUV combination and have image converted in no time.
    Caching would be an issue. Every time you hit the table you would incur a cache miss. If you can combine a lut with a smaller calculation, that would fit the table and code in cache, then you might have something, but at that point you might as well just use the optimized SSE code mplayer uses.

    There's also no reason for flash to be bringing the image back to the CPU for display unless the flash animation uses alpha blending with the browser contents, which only those annoying pop-over ads will do, not YouTube or Hulu.

    Comment


    • #32
      Originally posted by bridgman View Post
      AFAICS Mike is saying that they need APIs which will allow player code to sit between the decode and render operations. As Gwenole pointed out, both the ATI and NVidia implementations have a mechanism to do this (not sure about VA-API)
      Yes, VA-API has vaGetImage() for this. And this actually is what VLC is using. The GMA500 driver "psb" implements this function, but not the "iegd" driver. There is feature request open for it though.

      However, I can draw the following capabilities:
      - psb: read back as NV12
      - nvidia: read back as YV12, NV12, possibly RGBA with another indirection through a VdpOutputSurface
      - fglrx: read back as YV12 only. RGBA could be possible, but a bug currently prevents this from working. There could be a workaround through a GL download but I don't really want to maintain workarounds.

      Is Flash acceleration on Windows also really reading the pixels back? In my experience, this is slow on Linux on both AMD and NVIDIA, and even more painful on GMA500. How could this be way faster on Windows if the underlying driver code is common? It seems latest Flash and GMA500 drivers work well together, and considering that reading pixels back on this platform is unpractical, I strongly believe that even the Windows Flash is using overlays and not the pixels readback solution. Or are they using a Direct3D renderer?

      Comment


      • #33
        Originally posted by gbeauche View Post
        Is Flash acceleration on Windows also really reading the pixels back? In my experience, this is slow on Linux on both AMD and NVIDIA, and even more painful on GMA500. How could this be way faster on Windows if the underlying driver code is common? It seems latest Flash and GMA500 drivers work well together, and considering that reading pixels back on this platform is unpractical, I strongly believe that even the Windows Flash is using overlays and not the pixels readback solution. Or are they using a Direct3D renderer?
        I have to wonder this same question. 1080p30 video would require close to 100MB/s in YV12, or 200MB/s in RGB, both directions. In my mind that's a hefty chunk of the available CPU-GPU bandwidth, considering all the other things going on with a typical system.

        Comment


        • #34
          [This was originally an edit to my previous post, but apparently taking longer than 60 seconds to type an edit is disallowed by the forum sofftware]

          I haven't done much in graphics lately, so how much bandwidth is actually available these days, either using DMA or a straight memcpy (last time I was heavily involved in graphics, my desktop PC was barely able to push 1024x768x16bpp@30fps from the CPU to GPU, not using DMA)?

          Comment


          • #35
            Originally posted by rohcQaH View Post
            isn't that just a simple per-pixel matrix multiplication, i.e. a trivial task for any shader-based GPU? If the YUV image is already in video memory, why can't it be converted there before moving it to GPU space for manual blending?
            Yes. That's what video player do. Actually, they do a little bit more. They do the 4:2:0/4:2:2 to 4:4:4 upsampling and YUV->RGB in hardware on the card (virutally all video cards produced in the last 10a support this).

            But for flash it does not work as it needs the RGB data in system memory for further processing. Reading back from the graphics card would work, but it is slower then converting it in software on the CPU. (ofcourse only if you do not do the whole decoding process in hardware already).

            Comment


            • #36
              Originally posted by mirza View Post
              Colorspace Conversion: Isn't it simple to have 50MB of RAM reserved for YUV -> RGB conversion lookup table, for users that have no HW acceleration, but have plenty (well, 50MB) of RAM? This table can be shared in RAM for all running video encoding/decoding instances. In such table you can simply define RGB for each YUV combination and have image converted in no time.
              No, you want to do this conversion using math. Using a lookup table will kill your cache. Ie all memory access would then have to hit your SDRAM, which is very very very very slow.

              Originally posted by mirza View Post
              Scaling: How is scaling a problem? Did Second Reality demo scaled a bold guy's face in full screen on 486 back in Ronald Reagan era? Yes, it also used look-up table, you don't have to calculate everything all the time. For video scaled to 1920x1080 screen (size of source image is not important), you need only 8MB lookup table. Again, conversion is done in no time.
              Scaling increases the demand of bandwith. Although we've a lot more BW with PCI-E these days, it's still a pontential bottle neck. 1920x1080 is 6MB of data per frame (in 4:4:4 RGB). That's 177MByte/s with a 30fps movie, only of raw data, not counting the control data.

              Originally posted by mirza View Post
              Can somebody enlighten me why things that were simple in late eighties are complicated and _slow_ again on more then 100x times faster CPUs?
              Things weren't simple in the 80's but people were more easily impressed, even if it was obvious that you cheated.

              But actually, there was a time when video coding was "easy"... That was the time before HD, when 640x480 (DVDs) where the highest resolution you'd have to deal with and CPU clock frequency still increased every month. Then you could use more and more complex encoding strategies and didn't have to worry about BW use or CPU, as the next generation computers were for sure able to play it all!

              Comment


              • #37
                Originally posted by KotH View Post
                That was the time before HD, when 640x480 (DVDs) where the highest resolution you'd have to deal with

                DVD Resolutions are 720/704?480 for NTSC and 720/704?576 for PAL.

                Comment


                • #39
                  Nice to see Aaron get into the cause there.

                  Comment


                  • #40
                    Originally posted by deanjo View Post
                    Nice to see Aaron get into the cause there.
                    And Stephan, and Gwenole and Jay. Might see some action on this after all.

                    Comment

                    Working...
                    X