Announcement

Collapse
No announcement yet.

Rav1e Squeezes Out More Performance For This Rust-Written AV1 Encoder

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by bug77 View Post
    Safety, in this context, is not about "incriminating data". It's about thread safety, i.e. the encoder won't crash/segfault under conditions you didn't think about.
    Ah ok. Fair enough. Though honestly, I think anyone with the skills to successfully write in assembly is going to have the competence to avoid stuff like segfaults.

    Comment


    • #12
      Originally posted by schmidtbag View Post
      Ah ok. Fair enough. Though honestly, I think anyone with the skills to successfully write in assembly is going to have the competence to avoid stuff like segfaults.
      That's a very wrong assumption. Threading is extremely difficult to get right once you leave trivial example land. Even single-theaded mutable state managing (e.g. a variable that is mutable) is difficult to get right in complex scenarios.

      I didn't check out the code, but I'd suspect that the super-optimized assembler pieces of code are "standalone" units of logical operations "thread-local" so to speak. In other words, at least hopefully, the super-optimized stuff is only there to do some small very contained logical piece of work in the most hw-specific optimized way possible, but is wired in with the rest of the logic and threading through safe code written in Rust.

      If that's the case then using Rust makes perfect sense here.

      Comment


      • #13
        Originally posted by schmidtbag View Post
        Ah ok. Fair enough. Though honestly, I think anyone with the skills to successfully write in assembly is going to have the competence to avoid stuff like segfaults.
        Absolutely. But it will take a lot longer to write the code if you did that. Why write, for example, arguments parsing and validation in ASM?

        Comment


        • #14
          Originally posted by ermo View Post
          EDIT2: Your 4-speed manual gearbox example can be made into a good exposition of "engineering tradeoffs". It could be that the engine has a power-band so broad that the gearbox can be made with stronger gears and less complexity (fewer bushings, bearings, synchromeshes and moving parts) while still meeting the specific gearbox dimension, weight and manufacturing cost and overall performance design targets. One example of this is the 7.0L V8 engine and 4 speed gearbox Ford used in its 1966 Le Mans winning GT40 Mk II. For reliability reasons the engine was restricted to 6200 RPM even though it could run up to 7400 RPM in NASCAR trim (rotational kinetic energy depends on angular velocity squared and thus affects the acceleration and thus wear that the crankshaft and piston assembly experiences during each revolution) and its torque characteristics were such that sufficient thrust was produced above 3000 RPM in all gears so that it could still keep up with its opposition and reach a RPM-limited top speed of 205 mph at 6200 RPM down the Mulsanne straight with its 1.0:1 ratio 4th and final gear. Never mind that it was so heavy that Ford had to come up with a quick-change disc brake design (the brake disc size was limited by the 15" rims in use) that is still seeing wide use (with few variations) today. (source)
          Nice analogy.

          For a TLDR -- You can buy premade, standardized stuff from a parts dealer to build your car and you can still do custom fabrication work in certain places to maximize performance because the two aren't mutually exclusive.

          Comment


          • #15
            Originally posted by Almindor View Post
            That's a very wrong assumption. Threading is extremely difficult to get right once you leave trivial example land. Even single-theaded mutable state managing (e.g. a variable that is mutable) is difficult to get right in complex scenarios.
            Writing optimized code that takes advantage of hardware instructions is also beyond trivial example land. I stand by my point: if you can write something like that in assembly, you are competent enough to successfully write something multi-threaded [edit] without the need of Rust's safety. To reiterate - I don't have a problem with them using Rust.
            I didn't check out the code, but I'd suspect that the super-optimized assembler pieces of code are "standalone" units of logical operations "thread-local" so to speak. In other words, at least hopefully, the super-optimized stuff is only there to do some small very contained logical piece of work in the most hw-specific optimized way possible, but is wired in with the rest of the logic and threading through safe code written in Rust.

            If that's the case then using Rust makes perfect sense here.
            I agree with all of that, and if any of the devs can confirm you're right about that then I'd be satisfied with that answer. For the record, I didn't say that writing in Rust didn't make sense. Like I said before, I don't have a problem with the way things are done.

            EDIT:
            Originally posted by ermo View Post
            EDIT2: Your 4-speed manual gearbox example can be made into a good exposition of "engineering tradeoffs". It could be that the engine has a power-band so broad that the gearbox can be made with stronger gears and less complexity (fewer bushings, bearings, synchromeshes and moving parts) while still meeting the specific gearbox dimension, weight and manufacturing cost and overall performance design targets.
            True, though that's assuming there are such tradeoffs. If you have tight constraints (whether they're related to physics, production, cost, resources, time, etc) then of course, you have to accommodate them. But in the context of rav1e, what are the constraints that led them to use Rust, rather than the many alternative paths? I am not asking that hypothetically; those devs aren't idiots, so if you are right that there are constraints (which you very well could be) then I'm legitimately curious what they are.
            Last edited by schmidtbag; 20 November 2019, 11:44 AM.

            Comment


            • #16
              Even big names like Microsoft considering using Rust as alterantive to aging C or C++ for safer code.
              Using Rust in Windows for thins like great dev experience, learning Curve, safety, the unit testing built into Cargo.

              In summary

              Learning Rust has been a great experience for my Rust port, and I hope that through this blog post you can see why. The community resources make learning the language an enjoyable experience. Also, thanks to its strict compiler, correctness and better programming techniques can be better enforced, while the syntax of the language allows for clearer code.

              Alexander Clarke, Software Engineer Intern, MSRC


              Comment


              • #17
                Originally posted by zamadatix View Post
                What's the point of writing it in Rust if you're just going to use raw assembly in the core loops anyways.
                I would imagine that memory allocations are handled by Rust and that mathematical operations are handled by assembly.

                Comment


                • #18
                  Originally posted by schmidtbag View Post
                  I know you're being facetious but seriously, why not write everything in pure assembly? zamadatix makes a good point - if some of the most important and [presumably] difficult part of your code is already written in such a low-level language, why stop there? Or, why Rust, as opposed to any other high[er]-level language?

                  Think of it like this:
                  Let's say you're building your own car engine. You spend a lot of time and resources developing it, perfecting its efficiency and power. And then... you settle with an off-the-shelf 4-speed manual transmission. It's not that the transmission is bad - you're still getting direct drive of the motor, and at least you're not pairing it with some crappy 70s slushbox automatic. But it isn't the best thing to pair it with either and it seems like such an odd place to suddenly take take things easy, so some would wonder "why didn't you make the transmission too?".

                  EDIT:
                  I get that Rust is meant to be safer, but, what kind of incriminating data are you going to collect out of encoding an AV1 video?
                  Fun fact: RollerCoaster Tycoon (1&2) are about 98% pure assembly. That guy was crazy! It paid off, though: It ran quite well on the very limited hardware of the day.

                  Comment


                  • #19
                    Originally posted by schmidtbag View Post
                    Then why is such critical code in this encoder written in assembly at all? The more complicated your function is, the more cognitively taxing and error-prone it will be. To my understanding, writing an encoder is already complex, so I can't imagine how much harder it'd be to do fine tuning and access instruction sets in an already cognitively taxing language. In other words, a difficult language is used to write what may the hardest part to write for this encoder, and the easiest part is written in a relatively easy language. So, if using Rust was for the sake of reducing human error or to make the code more human-readable, well, that ship had already sailed.
                    Again - that's not necessarily a problem, just an odd approach, hence me not being the only one to notice it. I think it's great the rav1e devs care to go so deep with optimizations, because nobody does that these days and it does matter.

                    I understand that, which is what lead to my other question: why use Rust, as opposed to anything else? Unlike a lot of people here, I don't have a problem with Rust, but of every high-level language, I'm not sure I understand why that was chosen. And no, I'm not implying there's a better choice either.
                    Originally posted by schmidtbag View Post
                    Writing optimized code that takes advantage of hardware instructions is also beyond trivial example land. I stand by my point: if you can write something like that in assembly, you are competent enough to successfully write something multi-threaded [edit] without the need of Rust's safety. To reiterate - I don't have a problem with them using Rust.

                    I agree with all of that, and if any of the devs can confirm you're right about that then I'd be satisfied with that answer. For the record, I didn't say that writing in Rust didn't make sense. Like I said before, I don't have a problem with the way things are done.

                    EDIT:

                    True, though that's assuming there are such tradeoffs. If you have tight constraints (whether they're related to physics, production, cost, resources, time, etc) then of course, you have to accommodate them. But in the context of rav1e, what are the constraints that led them to use Rust, rather than the many alternative paths? I am not asking that hypothetically; those devs aren't idiots, so if you are right that there are constraints (which you very well could be) then I'm legitimately curious what they are.
                    In the Rav1e github repository, the title is literally "The fastest and safest AV1 encoder."

                    From the title alone, one can infer that any engineering trade-off will likely relate to encoding speed vs. bug density.

                    My thoughts on "safest":

                    In the context of languages with close-to-metal access, Rust is widely regarded as the safest way to write close-to-metal code when doing programming-in-the-large. Do you have evidence to the contrary? Which other language would you suggest using instead of Rust if the overarching goal is to use a language with facilities to promote and support memory safety in close-to-metal code? Ada?

                    My thoughts on "fastest":

                    In the specific examples where certain CPUs offer specialised instructions (and Rust hasn't yet grown support for generating assembly code that is just as fast), it seems to me that it would make sense to try to use Rust's unsafe and macro facilities to embed these localised pieces of code into the larger Rust code base?

                    If you disagree with this approach, ok. It's not my job to convince you that the chosen approach is sound, but as you yourself state, the developers of Rav1e aren't likely aren't stupid so it stands to reason that they will have considered the trade-offs? But perhaps you don't think they have and want someone to confirm that for you? If so, why don't you just contact them and get the relevant information from the horse's mouth so to speak?

                    Out of nothing but my own curiousity, these are the SLOC metrics as calculated by scc 2.10.1 when analysing commit 4abaed9 on the rav1e master branch:

                    Code:
                    $ scc --version
                    scc version 2.10.1
                    
                    $ scc -w
                    ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
                    Language                              Files     Lines   Blanks  Comments     Code Complexity Complexity/Lines
                    ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
                    Rust                                     92     52718     4013      4834    43871       4044           693.48
                    Assembly                                 27     49668     2390      1501    45777        249            30.74
                    TOML                                      8       250       34         9      207          0             0.00
                    Shell                                     8       189       33        40      116         23           143.62
                    Markdown                                  7       505      151         0      354          0             0.00
                    License                                   4       100       16         0       84          0             0.00
                    gitignore                                 4        16        1         0       15          0             0.00
                    YAML                                      3       342       26         6      310          0             0.00
                    Python                                    3       194       32        14      148         26            62.34
                    Plain Text                                1         8        0         0        8          0             0.00
                    Jupyter                                   1       607        0         0      607          0             0.00
                    ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
                    Total                                   158    104597     6696      6404    91497       4342           930.17
                    ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
                    Estimated Cost to Develop $3,097,936
                    Estimated Schedule Effort 23.577532 months
                    Estimated People Required 15.564273
                    ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
                    If scc's statistics are to be taken at face value, the complexity of the Rust code is more than an order of magnitude higher than the complexity of the Assembler code. This indeed suggests that the division of rust vs. assembly is an engineering tradeoff, where the chosen high-level language (Rust) is used for the more complex parts of the code while hand-optimised assembly is used where it can demonstrably increase the speed of the code in question? To support this hypothesis, please note that the assembly commits tend to have accompanying benchmark numbers (source)?

                    In light of the above hypothesis, I don't suppose you would consider re-evaluating your question of why rav1e isn't just written entirely in assembly? What benefits (if any) would doing so have compared to the current approach when taking into account the 'safe' goal?
                    Last edited by ermo; 20 November 2019, 01:57 PM.

                    Comment


                    • #20
                      Reasons to code in a low-level programming language:

                      + Near-assembly performance

                      + Faster coding because you don't need as much code

                      Summary: If you want top performance with less time spent coding, then low-level languages are a good option. If you find a bottleneck, you can write assembly for just those bits.

                      Best performance: Assembly

                      Worst time to code: Assembly

                      Solution:

                      Write in a language that gives you near-assembly performance.

                      (Now it's portable and fast.)

                      Rewrite some important bits with specific architecture Assembly optimizations.

                      (Now it's portable and really fast!)

                      Comment

                      Working...
                      X