Announcement

Collapse
No announcement yet.

Intel's Open-Source VP9 Video Encoder Just Scored A Massive ~3x Performance Boost

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Michael View Post

    It's somewhat similar to how PTS is capable of. Here is a 10 year old example - https://www.phoronix.com/scan.php?pa...nvik_iqc&num=1 - I did have some better examples of outlining the changed pixels but can't seem to find that example article at the moment. Those capabilities are in place but a matter of reliably being able to query the same frame/output, etc.
    These are just side-by-side comparisons. What I meant is subtracting both images, and checking for any non-black pixels.

    Comment


    • #12
      Originally posted by tildearrow View Post

      These are just side-by-side comparisons. What I meant is subtracting both images, and checking for any non-black pixels.
      Right there is that capability just can't find the article/sample I was looking for.
      Michael Larabel
      http://www.michaellarabel.com/

      Comment


      • #13
        Since I received a lot of "likes" on my post (and this is apparently how I validate my self worth), I went ahead and decided to do a slightly more elaborate test to see how libvpx and SVT stack up against each other.

        Sadly, my original goal was to tweak the hell out of the settings to try to get the best visual quality possible at 30fps encoding speed, but try as I might, I was not able to get libvpx to hit even close to that.. Then I tried to get them close to perceived quality and compare FPS, but that was a no-go as well, as they are actually quite different encoders and were giving wildly inconsistent results (libvpx seems to handle still images better, SVT-VP9 seems to prefer motion). I have decided to encode at reasonable settings but at the same fixed bitrate. Still not flawless, as libvpx rate control is wildly inaccurate... But it's the best I could come up with currently. I also hindered the encoding speed on SVT to try to somewhat match libvpx's low speed.. Somewhat of a "quality per clock cycle" sort of test.

        Tests conducted on a Ryzen 2700x, 32GB of RAM, latest Archlinux (btw...), ffmpeg 4.2 using latest SVT-VP9 git revision. Possibly slightly unfair to libvpx, but I doubt *that* much has changed.

        Here are my commandline outputs and the resulting outputs.
        SVT: $ sudo ffmpeg -thread_queue_size 1024 -i ~/rp1.mkv -pix_fmt yuv420p -vcodec libsvt_vp9 -rc 2 -g 32 -tune 0 -qp 45 -preset 4 -vf scale=1920x816 -b:v 1200K -acodec copy /tmp/rp1-svt-vp9.ivf
        frame= 2285 fps= 27 q=-0.0 Lsize= 13642kB time=00:01:35.01 bitrate=1176.2kbits/s dup=3 drop=0 speed= 1.1x
        video:13589kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.393914%


        libvpx: $ ffmpeg -thread_queue_size 1024 -i ~/rp1.mkv -pix_fmt yuv420p -vcodec libvpx-vp9 -row-mt 1 -cpu-used 4 -frame-parallel 1 -deadline realtime -vf scale=1920x816 -b:v 1200K -acodec copy /tmp/rp1-vpx-vp9.ivf
        frame= 2285 fps= 11 q=0.0 Lsize= 18216kB time=00:01:35.30 bitrate=1565.8kbits/s dup=3 drop=0 speed=0.474x
        video:18189kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.147390%

        It is worth noting that I was seeing 60+fps on svt-vp9 with "preset 5" without a significant drop in quality, but slightly noticeable, and trying to be somewhat fair on the 'quality per clock cycle' front (even if svt-vp9 still absolutely trounced libvpx in speed, even when hindered). Also worth noting, SVT-VP9 consistently used up all 16 threads and maxed out the CPU, whereas libvpx, no matter what I did, never exceeded about 3/4 of the processor, but usually a little over half. It's still a very poor threader.

        If there are any glaring issues with the encoding parameters that would affect the outcome, let me know and I shall rerun the tests.

        However, the real cream of the crop.. I'll include some screenshots of various scenes. I will even upload the video files for you to compare. Screenshots can only do so much in comparing a moving visual codec, and while svt-vp9 is quite decent, there are still notable encoding errors that affect the output that can only be seen in video form. Nothing significant for many, but if you're a video nerd, then you'll certainly notice them.

        Here are the screenshots.. Linking to them instead of embedding them, because they're huge.
        [original source]
        [svt encoded]
        [libvpx encoded]

        And here are the videos. Not including the original, as it's almost 300mb large and I only have ~7mbps upload speed from here (on an LTE connection). Remuxed to MKV after initial encode for ease of playback. Sound not included, because this is a video encoder test
        [svt]
        [libvpx]

        I hope this post has been informative as to the state of the svt-vp9 encoder.

        Comment


        • #14
          Originally posted by discordian View Post
          Making the best decisions on how to pack things the best way is an almost entirely serial process.
          but serial process can't take significant part of encoding time if vectorization results in 3x performance boost

          Comment


          • #15
            Originally posted by pal666 View Post
            but serial process can't take significant part of encoding time if vectorization results in 3x performance boost
            Nope, but the decisions affect your search, so while a GPU can do more at the same time, it will also then throw out most of it. Software encoders don't do brute force searches, and one frame/block can be composed of multiple other frames and itself can be used for later frames. The more threads you throw at it, the more frames/blocks cant depend on each other.

            Just look at how different sorting algorithms are, quicksort takes the least amount of cumulative cycles but can hardly be parallelized, as you cant partition the work in advance. On GPU/specialized Hardware you would do something like Bitonic mergesort, which requires more work but it can be distributed to multiple "workers". Sorting is an entirely serial problem (position of one element depends on all others), modern video codecs have alot serial dependencies (way more complex than a simple sort) - its a fundamental problem where you can be either smartly decide what to do (quicksort) or waste work. Having alot of latency does not help in keeping the workers busy with updated data either.

            Comment


            • #16
              BTW I am not aware of any Hardware encoder (that could be specialized to circumvent some GPU issues) that compares favorably quality-wise to even the fast modes of software encoders.

              Comment


              • #17
                Originally posted by discordian View Post
                BTW I am not aware of any Hardware encoder (that could be specialized to circumvent some GPU issues) that compares favorably quality-wise to even the fast modes of software encoders.
                I've heard decent things about the new turing encoders with h264, but I have not actually seen any decent comparisons yet (I fully admit it's been a while since I last checked, though)

                Comment

                Working...
                X