Announcement

Collapse
No announcement yet.

Intel Adds GPU-Accelerated Memory Copy Support To FFmpeg

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ms178
    replied
    Originally posted by coder View Post
    If you're talking about HSA, specifically, then I'm assuming the main reason AMD has seemingly walked away from it is that software vendors never embraced it. Perhaps if MS or Google showed renewed interest, that might be enough to bring it back. Otherwise, RIP HSA.
    I meant HSA-like functionality (such as shared virtual memory), not HSA specifically. It has been very calm on the HSA front, they had setup a Chinese chapter which was supposed to bring the standard forward. But I haven't heard of any progress there. SYCL does share some of the concepts of HSA, like a single source programming model and provides some of the same functionality. Maybe that vehicle will bring heterogenious computing on the desktop forward. I am waiting since 2012 for this to happen...

    Leave a comment:


  • coder
    replied
    Originally posted by ms178 View Post
    Maybe there is a comeback of similar functionality on the AMD side soon,
    If you're talking about HSA, specifically, then I'm assuming the main reason AMD has seemingly walked away from it is that software vendors never embraced it. Perhaps if MS or Google showed renewed interest, that might be enough to bring it back. Otherwise, RIP HSA.

    Leave a comment:


  • starshipeleven
    replied
    Originally posted by syrjala View Post

    That's actually just a cache.
    Which is shared with the CPU if I'm not mistaken?

    Leave a comment:


  • microcode
    replied
    Originally posted by coder View Post
    Context is everything. The new memcpy replaces an exisiting, userspace, CPU-based one. That should tell you that this is nothing to do with the buffer memory being outside the process' address space.
    Indeed, I said something stupid. lol.

    Leave a comment:


  • Jabberwocky
    replied
    Originally posted by ms178 View Post

    Maybe there is a comeback of similar functionality on the AMD side soon, just think of their chiplet approach with HBM on the same package. That implies HSA-like functionality not only on APUs but on their next-gen high performance cores as well.
    You're not the first one to mention this, hoping it will happen soon. I would be over the moon if I had the opportunity to play around with single die Zen+HBM+GPU!

    Leave a comment:


  • ms178
    replied
    Originally posted by fuzz View Post
    Shame HSA never really caught on :/
    Maybe there is a comeback of similar functionality on the AMD side soon, just think of their chiplet approach with HBM on the same package. That implies HSA-like functionality not only on APUs but on their next-gen high performance cores as well. I guess Intel's oneAPI approach with SYCL surrounding LLVM would help the software ecosystem as a whole to bring dGP + APU use for GPGPU tasks forward as this could also be targeted by AMD and other vendors.

    Also the industry is now zeroing in on CXL as a cache coherent protocol standard for connecting several devices together. That is also an important ingredient in the overall picture.
    Last edited by ms178; 10 October 2019, 07:28 AM. Reason: Additional aspect: CXL

    Leave a comment:


  • sandy8925
    replied
    Originally posted by coder View Post
    Given that system & video memory are the same physical RAM (in the iGPU case - the only one, currently), this only makes sense to me if ffmpeg doesn't know how to manage or use Intel's buffers.

    The only argument I can see why it might be strictly necessary to do the copy is that pre-Broadwell iGPUs didn't support shared memory between CPU & GPU. Of course, that's assuming that your app needs access to the output frame before it's displayed on screen. And, what blows a hole in that explanation is that a non-GPU version of the copy exists as a starting point.

    Anyway, if you're just going to display it after decoding, then just teach ffmpeg how to manage Intel's buffers and leave the data in "video" memory.
    True, I believe VAAPI already supports something like this. You can get a handle to the surface or buffer (and I think it's somehow tied to the kernel DRM and maybe DRM PRIME) and use it in other places like Wayland, OpenGL, Vulkan etc.

    So in the case of an iGPU, or if the decoded data is just going to be displayed, there's no reason to perform copies and move it around.

    Leave a comment:


  • fuzz
    replied
    Shame HSA never really caught on :/

    Leave a comment:


  • syrjala
    replied
    Originally posted by coder View Post
    It's still a cache even though it sits in a slightly different position in the topology. One interesting upside of the new arrangement is that the display engine can now "see" the eDRAM so your scanout buffers can remain eLLC cacheable. Currently i915 doesn't allow that though. I have a pending patch to enable it but I'm not quite 100% sure it's a good idea.

    Leave a comment:


  • coder
    replied
    Originally posted by microcode View Post
    In this case, the copy is needed because ffmpeg wants to do something with it on the CPU, in the ffmpeg process's address space, which typically can't operate directly on the GPU memory.
    Context is everything. The new memcpy replaces an exisiting, userspace, CPU-based one. That should tell you that this is nothing to do with the buffer memory being outside the process' address space.

    Leave a comment:

Working...
X