Intel Adds GPU-Accelerated Memory Copy Support To FFmpeg

ms178 replied

12 October 2019, 12:27 PM
Originally posted by coder View Post

If you're talking about HSA, specifically, then I'm assuming the main reason AMD has seemingly walked away from it is that software vendors never embraced it. Perhaps if MS or Google showed renewed interest, that might be enough to bring it back. Otherwise, RIP HSA.

I meant HSA-like functionality (such as shared virtual memory), not HSA specifically. It has been very calm on the HSA front, they had setup a Chinese chapter which was supposed to bring the standard forward. But I haven't heard of any progress there. SYCL does share some of the concepts of HSA, like a single source programming model and provides some of the same functionality. Maybe that vehicle will bring heterogenious computing on the desktop forward. I am waiting since 2012 for this to happen...
Likes 1
Leave a comment:
coder replied

12 October 2019, 09:59 AM
Originally posted by ms178 View Post

Maybe there is a comeback of similar functionality on the AMD side soon,

If you're talking about HSA, specifically, then I'm assuming the main reason AMD has seemingly walked away from it is that software vendors never embraced it. Perhaps if MS or Google showed renewed interest, that might be enough to bring it back. Otherwise, RIP HSA.
Leave a comment:
starshipeleven replied

11 October 2019, 10:05 AM
Originally posted by syrjala View Post

That's actually just a cache.

Which is shared with the CPU if I'm not mistaken?
Leave a comment:
microcode replied

10 October 2019, 08:00 PM
Originally posted by coder View Post

Context is everything. The new memcpy replaces an exisiting, userspace, CPU-based one. That should tell you that this is nothing to do with the buffer memory being outside the process' address space.

Indeed, I said something stupid. lol.
Likes 1
Leave a comment:
Jabberwocky replied

10 October 2019, 10:32 AM
Originally posted by ms178 View Post

Maybe there is a comeback of similar functionality on the AMD side soon, just think of their chiplet approach with HBM on the same package. That implies HSA-like functionality not only on APUs but on their next-gen high performance cores as well.

You're not the first one to mention this, hoping it will happen soon. I would be over the moon if I had the opportunity to play around with single die Zen+HBM+GPU!
Likes 2
Leave a comment:
ms178 replied

10 October 2019, 07:23 AM
Originally posted by fuzz View Post

Shame HSA never really caught on :/

Maybe there is a comeback of similar functionality on the AMD side soon, just think of their chiplet approach with HBM on the same package. That implies HSA-like functionality not only on APUs but on their next-gen high performance cores as well. I guess Intel's oneAPI approach with SYCL surrounding LLVM would help the software ecosystem as a whole to bring dGP + APU use for GPGPU tasks forward as this could also be targeted by AMD and other vendors.

Also the industry is now zeroing in on CXL as a cache coherent protocol standard for connecting several devices together. That is also an important ingredient in the overall picture.

Last edited by ms178; 10 October 2019, 07:28 AM. Reason: Additional aspect: CXL
Likes 2
Leave a comment:
Guest replied

09 October 2019, 04:05 PM
Originally posted by coder View Post

Given that system & video memory are the same physical RAM (in the iGPU case - the only one, currently), this only makes sense to me if ffmpeg doesn't know how to manage or use Intel's buffers.

The only argument I can see why it might be strictly necessary to do the copy is that pre-Broadwell iGPUs didn't support shared memory between CPU & GPU. Of course, that's assuming that your app needs access to the output frame before it's displayed on screen. And, what blows a hole in that explanation is that a non-GPU version of the copy exists as a starting point.

Anyway, if you're just going to display it after decoding, then just teach ffmpeg how to manage Intel's buffers and leave the data in "video" memory.

True, I believe VAAPI already supports something like this. You can get a handle to the surface or buffer (and I think it's somehow tied to the kernel DRM and maybe DRM PRIME) and use it in other places like Wayland, OpenGL, Vulkan etc.

So in the case of an iGPU, or if the decoded data is just going to be displayed, there's no reason to perform copies and move it around.
Likes 1
Leave a comment:
fuzz replied

09 October 2019, 03:25 PM
Shame HSA never really caught on :/
Likes 3
Leave a comment:
syrjala replied

09 October 2019, 02:40 PM
Originally posted by coder View Post

Was, but not in Skylake.
https://www.anandtech.com/show/10281...sors-65w-edram

It's still a cache even though it sits in a slightly different position in the topology. One interesting upside of the new arrangement is that the display engine can now "see" the eDRAM so your scanout buffers can remain eLLC cacheable. Currently i915 doesn't allow that though. I have a pending patch to enable it but I'm not quite 100% sure it's a good idea.
Likes 2
Leave a comment:
coder replied

09 October 2019, 02:02 PM
Originally posted by microcode View Post

In this case, the copy is needed because ffmpeg wants to do something with it on the CPU, in the ffmpeg process's address space, which typically can't operate directly on the GPU memory.

Context is everything. The new memcpy replaces an exisiting, userspace, CPU-based one. That should tell you that this is nothing to do with the buffer memory being outside the process' address space.
Likes 1
Leave a comment:

Announcement

Intel Adds GPU-Accelerated Memory Copy Support To FFmpeg

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: