Originally posted by Michael
View Post
Announcement
Collapse
No announcement yet.
Intel's Open-Source VP9 Video Encoder Just Scored A Massive ~3x Performance Boost
Collapse
X
-
Originally posted by tildearrow View Post
These are just side-by-side comparisons. What I meant is subtracting both images, and checking for any non-black pixels.Michael Larabel
https://www.michaellarabel.com/
- Likes 2
Comment
-
Since I received a lot of "likes" on my post (and this is apparently how I validate my self worth), I went ahead and decided to do a slightly more elaborate test to see how libvpx and SVT stack up against each other.
Sadly, my original goal was to tweak the hell out of the settings to try to get the best visual quality possible at 30fps encoding speed, but try as I might, I was not able to get libvpx to hit even close to that.. Then I tried to get them close to perceived quality and compare FPS, but that was a no-go as well, as they are actually quite different encoders and were giving wildly inconsistent results (libvpx seems to handle still images better, SVT-VP9 seems to prefer motion). I have decided to encode at reasonable settings but at the same fixed bitrate. Still not flawless, as libvpx rate control is wildly inaccurate... But it's the best I could come up with currently. I also hindered the encoding speed on SVT to try to somewhat match libvpx's low speed.. Somewhat of a "quality per clock cycle" sort of test.
Tests conducted on a Ryzen 2700x, 32GB of RAM, latest Archlinux (btw...), ffmpeg 4.2 using latest SVT-VP9 git revision. Possibly slightly unfair to libvpx, but I doubt *that* much has changed.
Here are my commandline outputs and the resulting outputs.
SVT: $ sudo ffmpeg -thread_queue_size 1024 -i ~/rp1.mkv -pix_fmt yuv420p -vcodec libsvt_vp9 -rc 2 -g 32 -tune 0 -qp 45 -preset 4 -vf scale=1920x816 -b:v 1200K -acodec copy /tmp/rp1-svt-vp9.ivf
frame= 2285 fps= 27 q=-0.0 Lsize= 13642kB time=00:01:35.01 bitrate=1176.2kbits/s dup=3 drop=0 speed= 1.1x
video:13589kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.393914%
libvpx: $ ffmpeg -thread_queue_size 1024 -i ~/rp1.mkv -pix_fmt yuv420p -vcodec libvpx-vp9 -row-mt 1 -cpu-used 4 -frame-parallel 1 -deadline realtime -vf scale=1920x816 -b:v 1200K -acodec copy /tmp/rp1-vpx-vp9.ivf
frame= 2285 fps= 11 q=0.0 Lsize= 18216kB time=00:01:35.30 bitrate=1565.8kbits/s dup=3 drop=0 speed=0.474x
video:18189kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.147390%
It is worth noting that I was seeing 60+fps on svt-vp9 with "preset 5" without a significant drop in quality, but slightly noticeable, and trying to be somewhat fair on the 'quality per clock cycle' front (even if svt-vp9 still absolutely trounced libvpx in speed, even when hindered). Also worth noting, SVT-VP9 consistently used up all 16 threads and maxed out the CPU, whereas libvpx, no matter what I did, never exceeded about 3/4 of the processor, but usually a little over half. It's still a very poor threader.
If there are any glaring issues with the encoding parameters that would affect the outcome, let me know and I shall rerun the tests.
However, the real cream of the crop.. I'll include some screenshots of various scenes. I will even upload the video files for you to compare. Screenshots can only do so much in comparing a moving visual codec, and while svt-vp9 is quite decent, there are still notable encoding errors that affect the output that can only be seen in video form. Nothing significant for many, but if you're a video nerd, then you'll certainly notice them.
Here are the screenshots.. Linking to them instead of embedding them, because they're huge.
[original source]
[svt encoded]
[libvpx encoded]
And here are the videos. Not including the original, as it's almost 300mb large and I only have ~7mbps upload speed from here (on an LTE connection). Remuxed to MKV after initial encode for ease of playback. Sound not included, because this is a video encoder test
[svt]
[libvpx]
I hope this post has been informative as to the state of the svt-vp9 encoder.
- Likes 5
Comment
-
-
Originally posted by pal666 View Postbut serial process can't take significant part of encoding time if vectorization results in 3x performance boost
Just look at how different sorting algorithms are, quicksort takes the least amount of cumulative cycles but can hardly be parallelized, as you cant partition the work in advance. On GPU/specialized Hardware you would do something like Bitonic mergesort, which requires more work but it can be distributed to multiple "workers". Sorting is an entirely serial problem (position of one element depends on all others), modern video codecs have alot serial dependencies (way more complex than a simple sort) - its a fundamental problem where you can be either smartly decide what to do (quicksort) or waste work. Having alot of latency does not help in keeping the workers busy with updated data either.
Comment
-
Originally posted by discordian View PostBTW I am not aware of any Hardware encoder (that could be specialized to circumvent some GPU issues) that compares favorably quality-wise to even the fast modes of software encoders.
Comment
Comment