Announcement

Collapse
No announcement yet.

Rav1e Achieves Another ~20% Speed-Up For Rust-Based AV1 Video Encoding

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rav1e Achieves Another ~20% Speed-Up For Rust-Based AV1 Video Encoding

    Phoronix: Rav1e Achieves Another ~20% Speed-Up For Rust-Based AV1 Video Encoding

    Rav1e's weekly-ish pre-releases for this Rust-written AV1 encoder have been focusing a lot on better performance via hand-written x86 Assembly, making use of SIMD extensions, and other fine tuning of their encoder. With this newest pre-release, another ~20% speed-up was obtained...

    http://www.phoronix.com/scan.php?pag...Percent-Faster

  • #2
    It seems unlikely this will ever be faster than Dav1d, seeing as all their speed improvements come from Dav1d...

    Comment


    • #3
      Originally posted by FireBurn View Post
      It seems unlikely this will ever be faster than Dav1d, seeing as all their speed improvements come from Dav1d...
      Rav1e is Encoder, Dav1d is Decoder.

      Comment


      • #4
        So when will it be practical to use? i.e. capable of encoding at 20 FPS 1080p 10 bit say on a 4 core CPU from 2016.

        Comment


        • #5
          Originally posted by FireBurn View Post
          It seems unlikely this will ever be faster than Dav1d, seeing as all their speed improvements come from Dav1d...
          Well that's good, it'd be weird to have the encoder faster than the decoder no?

          Comment


          • #6
            Hmm. In C, depending on the code, it's sometimes _very_ hard to do it better than a good compiler.
            Sure, if your code and data structures are rubbish, you're not making it easy for the compiler to do a good job, but that seems beside the point.
            The point of hand-rolling asm slowing going the way of the Dodo.

            So what am I actually seeing here? Since the benefits of speed increase seem to come from better hand optimized asm.

            Is this a:
            "The Rust compiler is not mature enough" ?
            "The Rust compiler does not know any instruction set extensions. So it cannot do any real vectorization" ?
            Something else?

            Comment


            • #7
              Originally posted by cl333r View Post
              So when will it be practical to use? i.e. capable of encoding at 20 FPS 1080p 10 bit say on a 4 core CPU from 2016.
              With tiling, which you can set to 4 with 1080p, it should already be practical.

              Comment


              • #8
                I'm currently waiting for two features which are not implemented yet: still image support and lossless mode.

                Comment


                • #9
                  Originally posted by milkylainen View Post
                  Hmm. In C, depending on the code, it's sometimes _very_ hard to do it better than a good compiler.
                  Sure, if your code and data structures are rubbish, you're not making it easy for the compiler to do a good job, but that seems beside the point.
                  The point of hand-rolling asm slowing going the way of the Dodo.
                  Not asm, but builtins/intrisincs are normally used, because compilers still are rubbish at vectorizing. Actually I have some simple code annotated with
                  Code:
                  #pragma nounroll
                  #pragma clang loop vectorize(disable)
                  since clang explodes small sections that in my case just run a few times to align the data...

                  Originally posted by milkylainen View Post
                  So what am I actually seeing here? Since the benefits of speed increase seem to come from better hand optimized asm.
                  ... and in codecs there still is handwritten asm in inner loops
                  Originally posted by milkylainen View Post
                  Is this a:
                  "The Rust compiler is not mature enough" ?
                  "The Rust compiler does not know any instruction set extensions. So it cannot do any real vectorization" ?
                  Something else?
                  Neither, its just a weird pet project that cant compete in performance, so its has a "safe" bulletpoint for some reasons. Probably the commandline parsing is "safe", who knows. De/Encoders are rather simple in terms of lifetime, you allocate everything up front, and deallocate everything after. Adding "unsafe" code to speed it up is just satire on top of that.

                  Comment


                  • #10
                    Originally posted by FireBurn View Post
                    It seems unlikely this will ever be faster than Dav1d, seeing as all their speed improvements come from Dav1d...
                    Because encoding works differently from decoding.

                    As codecs get more complex, decoding takes longer linearly, and encoding takes longer exponentially.

                    This means it is slightly harder to decode H.265 than H.264, but it is much harder to encode it.

                    Comment

                    Working...
                    X