Announcement

Collapse
No announcement yet.

RAV1E: The "Fastest & Safest" AV1 Encoder

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by sandy8925 View Post
    Oh ok. Looks like hand written assembly code did make a pretty big difference, atleast in your testing.
    To be clear, things like video encoding/decoding is pretty much the poster boy for assembly optimizations given that many cpu extensions (again, SSE, AVX) are extremely good for this kind of work, and also difficult for a compiler to handle effectively.

    Also I'm not sure just how effective the C code is in x264 that is used when you disable optimized assembly, I'm guessing that they haven't spent a ton of effort to make sure it's as fast as possible given that it would not be as fast as the assembly code either way.

    Across the software landscape, there is very little use of hand-optimized assembly, when it is used is basically where it really makes a difference.

    Comment


    • #42
      Interesting conversation here (about assembly handcoding vs compiler flags), but the post was about a faster encoder for AOM AV1.

      The code in this project at this time is faster by several times, it produces a much larger file than aomenc from the actual av1 reference encoder, but it is still operating with code prior to the bitstream freeze of the codec. The build.sh script that comes with the code runs an encode and decode test off of a Y4M (raw YUV video) file, and back, and it fails at decoding its own output (thus failing its test).

      The code has been updated several times since this article was published, but as of right now, it cannot actually produce a working output file.

      It'll be neat to compare it once it can.

      Comment


      • #43
        OK, it can now decode its own IVF format back to Y4M, but the reference decoder (which is also what is in ffmpeg now) cannot use the file, and it is 82kiB compared to 14kiB for the reference. MUCH faster at the encode though.

        Comment


        • #44
          Originally posted by Weasel View Post
          You know there's also people who render 3D animation without an entire farm and it goes much slower than this. So people who work on a video only need to encode it when they're done with edits (which should be non-destructive, I hope), which is a total non-issue.

          They work on it for months and then render it in one night. The horror of a few extra hours, such useless.
          I'm not doing that, I'm ripping DVDs I own so I don't have to remove and insert discs that, in some cases, don't even play right (stupid Babylon 5 Season 3 and 4 discs...). I'm not editing, I'm not rendering, I'm just transcoding, like people have been using x264 to accomplish for something like a decade now, and Xvid before that. If VP9, AOM1, or anything else fails at completing in a reasonable amount of time in that use case, it is garbage for that particular common use case.

          I don't know why I even come here anymore, it's full of incompetent trolls who can't even use correct grammar, let alone amuse anyone.
          Last edited by mulenmar; 19 July 2018, 08:23 PM.

          Comment


          • #45
            Originally posted by mulenmar View Post
            I'm not doing that, I'm ripping DVDs I own so I don't have to remove and insert discs that, in some cases, don't even play right (stupid Babylon 5 Season 3 and 4 discs...). I'm not editing, I'm not rendering, I'm just transcoding, like people have been using x264 to accomplish for something like a decade now, and Xvid before that. If VP9, AOM1, or anything else fails at completing in a reasonable amount of time in that use case, it is garbage for that particular common use case.

            I don't know why I even come here anymore, it's full of incompetent trolls who can't even use correct grammar, let alone amuse anyone.
            Transcoding is literal garbage because you always get garbage out compared to the original.

            Being bad at garbage doesn't really make it garbage.

            Comment


            • #46
              Originally posted by mulenmar View Post

              I'm not doing that, I'm ripping DVDs I own so I don't have to remove and insert discs that, in some cases, don't even play right (stupid Babylon 5 Season 3 and 4 discs...). I'm not editing, I'm not rendering, I'm just transcoding, like people have been using x264 to accomplish for something like a decade now, and Xvid before that. If VP9, AOM1, or anything else fails at completing in a reasonable amount of time in that use case, it is garbage for that particular common use case.

              I don't know why I even come here anymore, it's full of incompetent trolls who can't even use correct grammar, let alone amuse anyone.
              Nice thing about VP9 when doing TV-series is that you can encode several episodes in parallel and thus speed things up cosiderably. I did the first season of Buffy (the 1080p HD remake) yesterday, which is 13 episoded @ about 45 minutes each.

              Using the following settings:
              ffmpeg -c:v libvpx-vp9 -speed 2 -crf 27 -b:v 0 -tile-columns 4 -frame-parallel 1 -threads 8 -c:a libopus -ac 2 -b:a 128k
              And running two episodes in parallel @ about 8.5fps per episode, I managed to finish in less than 12 hours on my Ryzen 1700X. CPU load was about 50% so I could have probably run four episodes in parallel if I wanted to speed things up further.

              This took it down from ~45GiB to ~10GiB. Quality is still alright although I did notice a slight decline.

              It can be painfully slow when encoding feature lenght movies though.

              Comment


              • #47
                Originally posted by Brisse View Post
                I did the first season of Buffy (the 1080p HD remake) yesterday, which is 13 episoded @ about 45 minutes each.
                What remake? I'm pretty sure there isn't one.

                Comment


                • #48
                  Originally posted by LinAGKar View Post

                  What remake? I'm pretty sure there isn't one.
                  The terrible one: https://youtu.be/F28XcxHxH6k

                  Guess I meant to say remaster, not remake

                  Comment


                  • #49
                    Originally posted by Brisse View Post

                    The terrible one: https://youtu.be/F28XcxHxH6k

                    Guess I meant to say remaster, not remake
                    I see. Probably better off sticking to the original then. It will be faster to encode too, being at a lower resolution.

                    Comment


                    • #50
                      To add to the whole "hand optimized assembler" vs using an "optmizing compiler", check out these benchmarks on Serve The Home:

                      The Cavium ThunderX2 is a complete game changer in the server CPU market. Backed by a vastly improved Arm ecosystem, the ThunderX2 features 32 high speed Arm cores capable of a total of 128 threads and 56 PCIe lanes in a single socket, or 256 threads in a dual socket server


                      I you look at the results when a GCC -Ofast compiled version of the SPECrate benchmark is used, the ARM based system beats both the Xeon Gold and the EPYC system but when that same benchmark is compiled with Intel's ICC compiler for the Xeon and AMD's AOCC compiler for the EPYC, the AMD system wins handily.

                      Further, as they allude to in the article, there are ARM optimizing compilers available that this site does not have access to that most likely would have tilted the resulted to ARM's favor.

                      What this tells me is that spending time writing hand optimized assembler is maybe worth it IF you're going to use a general purpose compiler, like GCC, to build and distribute your software or software distributed only in source format that is meant to run on a wide variety of hardware that will be built with a general purpose compiler, but if you were planning on using a vendor supplied compiler then it's probably a waste of time.

                      Comment

                      Working...
                      X