Announcement

**computerquip** · 06 September 2015, 08:36 PM

Originally posted by Gusar View Post

Only I-frames (intra/key frames) are independent, meaning you can take just the I-frame and decode it in full. But then you have P-frames (predicted/inter frames) and B-frames (bidirectionally predicted frames), these aren't independent, they reference other frames. So you can't decode a particular P- or B-frame without having first decoded all its references.

Then, the whole bitstream is entropy encoded. So you of course you need to decode the bitstream to get to the video. Modern video formats use binary arithmetic coding here, which is by design a very serial process.

That said, NVENC and VDPAU are both really fast compared to their software-based brothers. NVENC is noticeably faster when using things like Open Broadcaster Software.

**smitty3268** · 07 September 2015, 02:59 AM

Originally posted by Daktyl198 View Post

It seems like the limitations being imposed are because of the way the codecs are done. Could somebody not create a codec that is designed for parallel decoding on a GPU? I feel like somebody, somewhere, would have tried something like that even as a hobby project.

Of course, but the point of doing it the way they are done is to reduce the bitrate. Generally people don't worry too much about encode or decode speed, they try to minimize the bitrate that is required to encode high quality video, so that it can be transmitted over the internet at decent speeds, or stored on limited storage, etc, and assume that any encode/decode issues can be worked around.

The reason behind all the frame dependencies is pretty obvious when you think about it. Generally speaking, when you go from 1 frame to the next 90% of the scene is going to stay exactly the same. Certain objects may have shifted position, but things will generally not be too different. So instead of having to encode the entire frame again, you just encode the 10% of pixels that are actually different - automatically giving you a 90% compression boost on that frame, which is massive. When the codec detects that too much of the scene has changed to be useful, it inserts another key frame and starts over again.

As for CABAC, the first and most expensive stage of video decoding in most h264 profiles and all h265, it could certainly be replaced by something much more GPU friendly. But it was chosen specifically because a lot of research had gone into how to do things in a way to reduce the bitrate, and that process was simply the best one they found. Ultimately, I think they just decided that all devices could build in special hardware to do that decoding, or rely on CPUs getting faster, and that's pretty much the way it's played out. It's one of the key reasons h264 is better than h263 in terms of bitrate - so if you want something more GPU friendly, h263 is actually not a bad starting spot. You just have to live with the knowledge that it won't be as good at the same bitrates that you can achieve with some of the more modern techniques.

**Gusar** · 07 September 2015, 06:33 AM

Originally posted by computerquip View Post

That said, NVENC and VDPAU are both really fast compared to their software-based brothers. NVENC is noticeably faster when using things like Open Broadcaster Software.

NVENC doesn't use the GPU. Maybe for a few things, but it's mostly dedicated encoder circuitry. All GPU encoders sucked, no exceptions, and they weren't even fast. So, like for decoding, vendors have started putting dedicated circuitry for encoding into their devices. This works much better, but will still not reach the quality of a software solution, it's just not as flexible. I wouldn't use it for archives (backing up your DVDs or Blu-rays), but it's probably good enough for certain other scenarios.

**carewolf** · 09 September 2015, 05:40 PM

Originally posted by Gusar View Post

NVENC doesn't use the GPU. Maybe for a few things, but it's mostly dedicated encoder circuitry. All GPU encoders sucked, no exceptions, and they weren't even fast. So, like for decoding, vendors have started putting dedicated circuitry for encoding into their devices. This works much better, but will still not reach the quality of a software solution, it's just not as flexible. I wouldn't use it for archives (backing up your DVDs or Blu-rays), but it's probably good enough for certain other scenarios.

Depend on what part of decoding you are talking about. Post-processing and YUV conversion+scaling works very well on GPU, and is almost always done as the last step on GPUs now.

**Gusar** · 09 September 2015, 06:16 PM

Originally posted by carewolf View Post

Depend on what part of decoding you are talking about. Post-processing and YUV conversion+scaling works very well on GPU, and is almost always done as the last step on GPUs now.

The post you're replying to talks about *en*coding.

But ok, decoding... Post-processing isn't part of decoding. That's why it's post-processing, it's done after decoding. When it's part of decoding, it's in-loop filtering (like the in-loop deblocker in modern codecs). However, post-processing is something GPUs are good at, that's true.

Colorspace conversion and scaling aren't part of decoding either, it's also something that happens after it. But yeah, GPUs are fairly good at it. You're wrong that it's almost always done on the GPU though - at least with Intel VAAPI, csc and scaling are done on the ASIC. It's why mpv is more efficient with --vo=vaapi than with --vo=opengl - the former uses the ASIC, the latter the GPU. In actual numbers on my Haswell machine: http://pastebin.ca/2969780. ARM SoCs very likely have dedicated csc and scaling circuitry too.

Announcement

All The Big Names Are Joining A New Alliance For Open Media

Comment

Comment

Comment

Comment

Comment