Announcement

**milkylainen** · 15 December 2019, 01:45 PM

Originally posted by discordian View Post

Code:

#pragma nounroll
#pragma clang loop vectorize(disable)

since clang explodes small sections that in my case just run a few times to align the data...

Good points.
Sure. Compilers can do a serious fubar with some vectorization and unrolling, just as you mention. And they can be pretty stupid sometimes.
But my experience with vectorization and unrolling recently seem to land in me in "making it simpler for the compiler to understand".
If the unrolling and vectorization is non-obvious, chance is it's going to do a botched job trying to "fix" it.

**Mario Junior** · 15 December 2019, 02:23 PM

That's why I love assembly even without knowing how to write a "hello world" in python.

**Mario Junior** · 15 December 2019, 02:26 PM

What Michael?

**Zan Lynx** · 15 December 2019, 03:17 PM

Originally posted by milkylainen View Post

Hmm. In C, depending on the code, it's sometimes _very_ hard to do it better than a good compiler.
Sure, if your code and data structures are rubbish, you're not making it easy for the compiler to do a good job, but that seems beside the point.
The point of hand-rolling asm slowing going the way of the Dodo.

So what am I actually seeing here? Since the benefits of speed increase seem to come from better hand optimized asm.

Is this a:
"The Rust compiler is not mature enough" ?
"The Rust compiler does not know any instruction set extensions. So it cannot do any real vectorization" ?
Something else?

I've messed around with C++ vectorization some.

In a lot of cases the real speed improvements come from seeing what the vector instructions can do, and arranging the data in the correct alignment and formats. It's often much easier to use hand-coded ASM because fighting the compiler involves a lot of non-standard hacking to prove that the data is aligned, no you don't want it copied, yes, you can trust that reading 32 bytes is safe here even if the array length is unknown... etc, etc. Plus things like register allocation and waiting enough cycles before reusing a register are very important for the best performance. You could maybe get a compiler to the same performance level as hand coded ASM but it would require a compiler-driven benchmark combined with some machine learning. It would take ages to build the final binary. Plus at least a couple of times I hit a memory bandwidth limit (laptop RAM) and your benchmark testing become useless unless you realize that because suddenly you're making changes and nothing improves.

**Michael** · 15 December 2019, 03:52 PM

Originally posted by Mario Junior View Post

What Michael?

Just the crappy spam filter trying to filter out link spam and such sometimes having false positives... should appear now.

**Mario Junior** · 15 December 2019, 04:10 PM

Originally posted by Michael View Post

Just the crappy spam filter trying to filter out link spam and such sometimes having false positives... should appear now.

Thanks!

**bug77** · 15 December 2019, 07:36 PM

Originally posted by milkylainen View Post

Hmm. In C, depending on the code, it's sometimes _very_ hard to do it better than a good compiler.
Sure, if your code and data structures are rubbish, you're not making it easy for the compiler to do a good job, but that seems beside the point.
The point of hand-rolling asm slowing going the way of the Dodo.

So what am I actually seeing here? Since the benefits of speed increase seem to come from better hand optimized asm.

Is this a:
"The Rust compiler is not mature enough" ?
"The Rust compiler does not know any instruction set extensions. So it cannot do any real vectorization" ?
Something else?

You should look up what Rust (usually) uses for its compiler backend

**quikee** · 16 December 2019, 07:21 AM

Originally posted by bug77 View Post

I'm currently waiting for two features which are not implemented yet: still image support and lossless mode.

You can't have a working video encoder without having still image support. If you think AVIF - this has nothing to do with rav1e directly. For AVIF you take AV1 encoded I-frame and put it in a ISOBMFF container with appropriate metadata set. Where you get the AV1 encoded I-frame can be totally independent of the library that packs it together, and it certainly is done like that: libavif [1] can use aomenc or rav1e as the encoder, and aomenc, dav1d as the decoder.

[1] https://github.com/AOMediaCodec/libavif

**modpunk** · 16 December 2019, 07:47 AM

Originally posted by quikee View Post

You can't have a working video encoder without having still image support. If you think AVIF - this has nothing to do with rav1e directly.

Setting still image mode in rav1e doesn't work, the created picture can't be opened, see https://github.com/xiph/rav1e/issues/1000
libaom supports it, but libavif isn't able to be built with libaom 1.0. Yes, I'm using libavif and created patches for it.

**quikee** · 16 December 2019, 09:28 AM

Originally posted by modpunk View Post

Setting still image mode in rav1e doesn't work, the created picture can't be opened, see https://github.com/xiph/rav1e/issues/1000
libaom supports it, but libavif isn't able to be built with libaom 1.0. Yes, I'm using libavif and created patches for it.

I see. OK, thanks for correction.

Announcement

Rav1e Achieves Another ~20% Speed-Up For Rust-Based AV1 Video Encoding

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment