Announcement

Collapse
No announcement yet.

Intel Adds GPU-Accelerated Memory Copy Support To FFmpeg

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by willmore View Post
    Do I misunderstand or is the common belief that best performance comes when you don't make any unnecessary copies in the first place?
    "Yes, we're inefficient, but we optimized the inefficient parts so they're not as wasteful."
    In this case, the copy is needed because ffmpeg wants to do something with it on the CPU, in the ffmpeg process's address space, which typically can't operate directly on the GPU memory.

    Comment


    • #12
      Originally posted by coder View Post
      Given that system & video memory are the same physical RAM (in the iGPU case - the only one, currently), this only makes sense to me if ffmpeg doesn't know how to manage or use Intel's buffers.

      The only argument I can see why it might be strictly necessary to do the copy is that pre-Broadwell iGPUs didn't support shared memory between CPU & GPU. Of course, that's assuming that your app needs access to the output frame before it's displayed on screen. And, what blows a hole in that explanation is that a non-GPU version of the copy exists as a starting point.

      Anyway, if you're just going to display it after decoding, then just teach ffmpeg how to manage Intel's buffers and leave the data in "video" memory.
      I won't pretend to know the inner workings of the Intel silicon package; I'm basically extrapolating from the commit message ("GPU copy enables or disables GPU accelerated copying between video and system memory.") and my knowledge of how ffmpeg exposes hwaccel decoding + filtering in its filtergraph.

      Comment


      • #13
        Originally posted by AluminumGriffin View Post
        Some of Intel's iGPUs have their own memory, for instance the Intel Iris Plus 640 (kabylake (nuc7i5 for instance)) has 64MB of eDRAM
        That's actually just a cache.

        Comment


        • #14
          Originally posted by AluminumGriffin View Post
          Some of Intel's iGPUs have their own memory, for instance the Intel Iris Plus 640 (kabylake (nuc7i5 for instance)) has 64MB of eDRAM
          I'm guessing, to the extent that's used for video decompression, that it's private to the driver.
          Last edited by coder; 09 October 2019, 02:01 PM.

          Comment


          • #15
            Originally posted by syrjala View Post
            That's actually just a cache.
            Was, but not in Skylake.

            https://www.anandtech.com/show/10281...sors-65w-edram

            Comment


            • #16
              Originally posted by microcode View Post
              In this case, the copy is needed because ffmpeg wants to do something with it on the CPU, in the ffmpeg process's address space, which typically can't operate directly on the GPU memory.
              Context is everything. The new memcpy replaces an exisiting, userspace, CPU-based one. That should tell you that this is nothing to do with the buffer memory being outside the process' address space.

              Comment


              • #17
                Originally posted by coder View Post
                It's still a cache even though it sits in a slightly different position in the topology. One interesting upside of the new arrangement is that the display engine can now "see" the eDRAM so your scanout buffers can remain eLLC cacheable. Currently i915 doesn't allow that though. I have a pending patch to enable it but I'm not quite 100% sure it's a good idea.

                Comment


                • #18
                  Shame HSA never really caught on :/

                  Comment


                  • #19
                    Originally posted by coder View Post
                    Given that system & video memory are the same physical RAM (in the iGPU case - the only one, currently), this only makes sense to me if ffmpeg doesn't know how to manage or use Intel's buffers.

                    The only argument I can see why it might be strictly necessary to do the copy is that pre-Broadwell iGPUs didn't support shared memory between CPU & GPU. Of course, that's assuming that your app needs access to the output frame before it's displayed on screen. And, what blows a hole in that explanation is that a non-GPU version of the copy exists as a starting point.

                    Anyway, if you're just going to display it after decoding, then just teach ffmpeg how to manage Intel's buffers and leave the data in "video" memory.
                    True, I believe VAAPI already supports something like this. You can get a handle to the surface or buffer (and I think it's somehow tied to the kernel DRM and maybe DRM PRIME) and use it in other places like Wayland, OpenGL, Vulkan etc.

                    So in the case of an iGPU, or if the decoded data is just going to be displayed, there's no reason to perform copies and move it around.

                    Comment


                    • #20
                      Originally posted by fuzz View Post
                      Shame HSA never really caught on :/
                      Maybe there is a comeback of similar functionality on the AMD side soon, just think of their chiplet approach with HBM on the same package. That implies HSA-like functionality not only on APUs but on their next-gen high performance cores as well. I guess Intel's oneAPI approach with SYCL surrounding LLVM would help the software ecosystem as a whole to bring dGP + APU use for GPGPU tasks forward as this could also be targeted by AMD and other vendors.

                      Also the industry is now zeroing in on CXL as a cache coherent protocol standard for connecting several devices together. That is also an important ingredient in the overall picture.
                      Last edited by ms178; 10 October 2019, 07:28 AM. Reason: Additional aspect: CXL

                      Comment

                      Working...
                      X