Announcement

Collapse
No announcement yet.

Intel Releases New Linux Media Driver For VA-API

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel Releases New Linux Media Driver For VA-API

    Phoronix: Intel Releases New Linux Media Driver For VA-API

    While Intel has been supporting VA-API for years, basically since X-Video/XvMC became irrelevant, as its primary video API for video acceleration, they are now rolling out a new media driver...

    http://www.phoronix.com/scan.php?pag...-VA-API-Driver

  • #2
    this driver might be faster and use opencl
    interesting

    Comment


    • #3
      Hmm, I see cmrt mentions in the README, yet current cmrt [1] has a binary only blob that is required for anything using cmrt and intel-hybrid-driver. I wonder if the shipped cmrt needs the jitter library as well, or it can function without it?

      [1] https://github.com/01org/cmrt/tree/master/jitter
      Last edited by Krejzi; 12-01-2017, 09:21 AM.

      Comment


      • #4
        Unless there's something I'm missing, I'm surprised OpenCL hasn't been involved much in video decoding. Seems like it'd make development much simpler, since it could eliminate the need for ASICs (even if integrated) and reduce hardware-specific drivers. Note: I am aware that OCL video encoders have been a thing for a while.

        Comment


        • #5
          Originally posted by schmidtbag View Post
          Unless there's something I'm missing, I'm surprised OpenCL hasn't been involved much in video decoding. Seems like it'd make development much simpler, since it could eliminate the need for ASICs (even if integrated) and reduce hardware-specific drivers. Note: I am aware that OCL video encoders have been a thing for a while.
          Depending on what part of the decoding pipeline you're looking at, there's parts of video decoding that really don't parallelize well. The stream decompression, macroblock filtering, and iDCT steps are very, very branchy and have a lot of dependencies on the previous step that executed. When you get to the end-of-frame deblocking filtering, that's the part that parallelizes decently (I think I had it to the point that a 1080p VP8 video could have its loop filter split across ~192 threads). At least for VP8, that eliminated about 40-50% of the per-frame execution time, but the copies back and forth between the CPU and GPU never let the performance get back to the level of just running the decoding on the CPU alone (and nowhere near the speed of a dedicated ASIC).

          Maybe if someone could find a way to tease out the dependencies in decoding earlier in the frame and keep all of the decompression/filtering/idct work on the GPU it would work, but you basically have to offload it all, or communication latency/bandwidth doesn't make it worth it.

          Comment


          • #6
            Originally posted by Veerappan View Post
            Depending on what part of the decoding pipeline you're looking at, there's parts of video decoding that really don't parallelize well. The stream decompression, macroblock filtering, and iDCT steps are very, very branchy and have a lot of dependencies on the previous step that executed. When you get to the end-of-frame deblocking filtering, that's the part that parallelizes decently (I think I had it to the point that a 1080p VP8 video could have its loop filter split across ~192 threads).
            I understand that, but OpenCL requires CPU involvement no matter what. So I don't see why the CPU couldn't do things like decompression and macroblock filtering, while the GPU does EoF deblocking filtering, rendering, and so on. This ought to be efficient enough where even a low-end processors won't struggle.
            At least for VP8, that eliminated about 40-50% of the per-frame execution time, but the copies back and forth between the CPU and GPU never let the performance get back to the level of just running the decoding on the CPU alone (and nowhere near the speed of a dedicated ASIC).
            Well for one thing, I'm not suggesting all codecs run with OpenCL; some were clearly designed with the intention of CPU decoding, and therefore would run more efficiently there. That being said, unless OpenCL negatively impacts the framerate, why does it matter if there is more back and forth communication? Unlike games, pre-encoded videos have a fixed framerate and don't require much user input (so latency is mostly irrelevant). So, even if the maximum framerate decreases while latency increases, the only thing that matters is if low-end hardware that struggled to play back the video can now do it smoothly.

            Comment


            • #7
              Little typo:

              Originally posted by phoronix View Post
              Intel Graphics Memory Management LIbrary.

              Comment


              • #8
                As they created this new driver does it mean they are abandoning the current one? Do you also thing that they will include support for older GPUs like Haswell later?

                Comment


                • #9
                  Originally posted by Veerappan View Post

                  Depending on what part of the decoding pipeline you're looking at, there's parts of video decoding that really don't parallelize well. The stream decompression, macroblock filtering, and iDCT steps are very, very branchy and have a lot of dependencies on the previous step that executed. When you get to the end-of-frame deblocking filtering, that's the part that parallelizes decently (I think I had it to the point that a 1080p VP8 video could have its loop filter split across ~192 threads). At least for VP8, that eliminated about 40-50% of the per-frame execution time, but the copies back and forth between the CPU and GPU never let the performance get back to the level of just running the decoding on the CPU alone (and nowhere near the speed of a dedicated ASIC).

                  Maybe if someone could find a way to tease out the dependencies in decoding earlier in the frame and keep all of the decompression/filtering/idct work on the GPU it would work, but you basically have to offload it all, or communication latency/bandwidth doesn't make it worth it.
                  Well, if we had a scenario where steps 1,2,3 run on the CPU and steps 4,5,6 run on the GPU, and the final output frame only needs to be on the GPU (since it's going to be displayed), wouldn't that avoid copying back and forth between the CPU and the GPU? What about compositing, playing the video in a window, or say in a HTML 5 video element in a browser? Would that necessarily require moving the decoded video frame to the CPU memory, or could it stay in GPU memory and have the GPU do all of the compositing work?

                  Comment


                  • #10
                    probaly will see some features of windows finally in linux

                    Comment

                    Working...
                    X