Originally posted by syrjala
View Post
Announcement
Collapse
No announcement yet.
Intel Adds GPU-Accelerated Memory Copy Support To FFmpeg
Collapse
X
-
Originally posted by AluminumGriffin View PostSome of Intel's iGPUs have their own memory, for instance the Intel Iris Plus 640 (kabylake (nuc7i5 for instance)) has 64MB of eDRAMLast edited by coder; 09 October 2019, 02:01 PM.
Leave a comment:
-
Originally posted by coder View PostGiven that system & video memory are the same physical RAM (in the iGPU case - the only one, currently), this only makes sense to me if ffmpeg doesn't know how to manage or use Intel's buffers.
The only argument I can see why it might be strictly necessary to do the copy is that pre-Broadwell iGPUs didn't support shared memory between CPU & GPU. Of course, that's assuming that your app needs access to the output frame before it's displayed on screen. And, what blows a hole in that explanation is that a non-GPU version of the copy exists as a starting point.
Anyway, if you're just going to display it after decoding, then just teach ffmpeg how to manage Intel's buffers and leave the data in "video" memory.
Leave a comment:
-
Originally posted by willmore View PostDo I misunderstand or is the common belief that best performance comes when you don't make any unnecessary copies in the first place?
"Yes, we're inefficient, but we optimized the inefficient parts so they're not as wasteful."
Leave a comment:
-
Originally posted by coder View PostGiven that system & video memory are the same physical RAM (in the iGPU case - the only one, currently), this only makes sense to me if ffmpeg doesn't know how to manage or use Intel's buffers.
- Likes 1
Leave a comment:
-
Originally posted by sturmen View Postmy understanding of this change is a lot more benign: ffmpeg needs to copy the compressed H.264 stream into the iGPU's onboard memory pool, and then read the iGPU's uncompressed output from that embedded memory pool back into RAM so that ffmpeg (which is primarily CPU based) can operate on it.
The only argument I can see why it might be strictly necessary to do the copy is that pre-Broadwell iGPUs didn't support shared memory between CPU & GPU. Of course, that's assuming that your app needs access to the output frame before it's displayed on screen. And, what blows a hole in that explanation is that a non-GPU version of the copy exists as a starting point.
Anyway, if you're just going to display it after decoding, then just teach ffmpeg how to manage Intel's buffers and leave the data in "video" memory.
- Likes 3
Leave a comment:
-
my understanding of this change is a lot more benign: ffmpeg needs to copy the compressed H.264 stream into the iGPU's onboard memory pool, and then read the iGPU's uncompressed output from that embedded memory pool back into RAM so that ffmpeg (which is primarily CPU based) can operate on it. This commit merely adds support for some QSV API that lets the iGPU manage that copy (`ff_qsv_get_continuous_buffer`) rather than the generic `ff_get_buffer` function written by FFmpeg. This QSV call is likely to be faster.
- Likes 3
Leave a comment:
-
Originally posted by treba View Post
Oh yes, they surely missed that. Now that you said it, they will discover that those copies could be avoided just as easily.
Edit: sorry, didn't mean to be so sarcastic. Just wanted to say: I'm sure they thought about that, but here they are optimizing copies that are hard to avoid.
For Intel, they have much more compute resources to draw on, but the arguement is the same--any time you go to memory you use power. The same decoder is in mobile parts as well as desktop parts, so it seems important to be power efficient.
The conclusion of this is that *if the decoder->display pipeline requires a lot of coppies*, then you designed it wrong to start with. I get that we're past that stage and we're into the "making the best of what we have" and that's commendable. And I appreciate that it's different people who design the chips than who end up coding the drivers. So, I give Intel credit for this optimization, but at the same time Intel deserves scorn for designing it to *need* copies in the first place.
Possibly some of it may be the fault of the API as it may not support the buffer types that the hardware can handle and that requires some copy or transform/copy steps that wouldn't have been necessary if the software didn't force some way of doing things on the hardware. If that's the case, then the same arguement stands. Thanks for making things better, but the real soltution is to fix the API. And that's a real option. Look at all the work that's being done by the people doing the Allwinner decoder support. They had to come up with a whole new API because their hardware is fundimentally different than normal PC/GPU type of video decoding.
Summary: Thank you to the people at Intel who coded up this optimization. I hope it benefits lots of people. But also, could you smack the hardware people around a bit and get them to design things to work better or find out how it's supposed to be used and code for that instead?
- Likes 7
Leave a comment:
Leave a comment: