Announcement

Collapse
No announcement yet.

FFmpeg Has Seen Some AVX2 Optimizations For VP9 Decoding

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • abi_dalzim
    replied
    Amd's AVX2 implementation is slower than Intel's on everything from Bulldozer derivatives to Zen derivatives (although zen2 or whatever it's called these days might get 256 but wide AVX2 execution units if the move to 7nm buys them enough transistors to play with) + they are also a GPU company. I assume their goal is to move video decoding to the gpu which will be both faster and more power efficient.

    Intel on the other hand is a CPU company first so.....

    Leave a comment:


  • juno
    replied
    Originally posted by aufkrawall View Post
    I think that would be possible if AMD wanted to.
    It is definitely possible, but I've already written my opinion about that. Btw., I thought we're talking about real time decoding during playback, not conversion. If there is no hardware support and we're talking about offline-conversion, I'd rather have an OpenCL de-/encoder than any hardware vendor to come up with an individual solution and implement it into various APIs (or not, which is actually where we're at).

    Leave a comment:


  • sdack
    replied
    Originally posted by juno View Post
    Only that I don't. There is no fixed function VP9 hardware on any GCN GPU. If it runs on the CUs, sure I've paid for that, but I still can't use it as it's not supported in VA-API (nor likely in dxva, for that matter). I could maybe look for an OpenCL encoder though... Thats why I said this list ist useless.

    That's another story. Complain to Nvidia if you have a VP9 encoder but can't use it because they don't support VA.
    Luckily for me does ffmpeg fully support my Nvidia card, allowing me to do decode and encode in hardware at amazing speeds while the CPU sits idle at <5%. I'm all happy and glad I don't have to rely on some tweaked code that still needs to run on the CPU.

    Leave a comment:


  • aufkrawall
    replied
    Originally posted by juno View Post
    If it runs on the CUs, sure I've paid for that, but I still can't use it as it's not supported in VA-API (nor likely in dxva, for that matter).
    I think that would be possible if AMD wanted to. The HEVC 8 bit hybrid decoder on Maxwell GPUs older than GTX 960 was/is usable via DXVA, to applications it likely looks like a normal native decoder.

    Leave a comment:


  • juno
    replied
    Originally posted by sdack View Post
    Sure, the list may be useless to you, but when you know you have the hardware for it then why would you ever want to settle for less? You know it's just a wasted piece of hardware otherwise, paid for by your good money.
    Only that I don't. There is no fixed function VP9 hardware on any GCN GPU. If it runs on the CUs, sure I've paid for that, but I still can't use it as it's not supported in VA-API (nor likely in dxva, for that matter). I could maybe look for an OpenCL encoder though... Thats why I said this list ist useless.

    Originally posted by sdack View Post
    If it then decodes on the CPU using mmx, sse, avx or now avx2 is pretty much irrelevant. It's nice but that's about it. It's still bad news for all those people who do have the hardware for VP9 decoding, but still have to let the CPU do it.
    That's another story. Complain to Nvidia if you have a VP9 encoder but can't use it because they don't support VA.

    Leave a comment:


  • tildearrow
    replied
    Typo:

    Originally posted by phoronix View Post
    Advanced Vector Extensions 2 instrunctions have been supported since Intel Haswell

    Leave a comment:


  • andreano
    replied
    Kudos for making the world's fastest VP9 decoder even faster!

    (A faster decoder translates to being able to play higher resolutions and framerates, which is probably more interesting to most people than actually speeding up the video.)

    Let's hope this is transferrable to AV1 (if/when ffmpeg decides to make an AV1 decoder) – the reason ffvp9 was so fast (according to that link↑) was that it shared so much optimized code with other ffmpeg codecs.
    Last edited by andreano; 27 August 2017, 01:03 PM.

    Leave a comment:


  • sdack
    replied
    Originally posted by juno View Post
    Also, the table counts "GPU or DSP based implementations – software implementations on non-CPU hardware", which is pretty useless. E.G. even AMD's most recent UVD seen in Vega dones't support VP9. The power-hungry hybrid decoder does only work on Windows.
    Sure, the list may be useless to you, but when you know you have the hardware for it then why would you ever want to settle for less? You know it's just a wasted piece of hardware otherwise, paid for by your good money. If it then decodes on the CPU using mmx, sse, avx or now avx2 is pretty much irrelevant. It's nice but that's about it. It's still bad news for all those people who do have the hardware for VP9 decoding, but still have to let the CPU do it.

    Leave a comment:


  • caligula
    replied
    Originally posted by sdack View Post
    Speeds beyond 1x do actually matter, because transcoding is also done in hardware these days and at speeds beyond 1x. You also don't want to have your CPU running at 100% usage while playing a video.
    Sure, but transcoding is a totally different use case. I'm a power user and my family does things like photo editing, but only recently we started with video editing, thanks to a new 4k capable easy to use DSLR. There are tons of users, machines, and use cases that don't involve transcoding or video encoding in any way. Most use of videos is decoding for playback at realtime speed.

    For video decoding, a dedicated decoder hardware DSP is always the best choice. Using AVX might speedup by 100% and save battery by 50%, but the decoder chips reduce power consumption by 99 to 99.9%.

    Leave a comment:


  • SavageX
    replied
    While AMD's Excavator shouldn't benefit from AVX2 optimizations as much as CPUs from Intel as the SIMD units are not as wide, it should still be more efficient to do things with fewer instructions by using AVX over SSE. This should lower pressure in the rest of the execution pipeline, e.g. the decoders.

    Leave a comment:

Working...
X