Originally posted by dev_null
View Post
SiFive HiFive Premier P550 RISC-V Price Lowered, Ubuntu 24.04 Support Ready
Collapse
X
-
No source code for the firmware, right ? https://github.com/sifive/hifive-pre...ools/issues/11 Why it's better than any other board then ?
Leave a comment:
-
-
Originally posted by Quackdoc View PostHe is spewing massively blatant lies about what I said, I think it's safe to ignore him as a lunatic.
I don't expect I can change anyone's mind in a single exchange. However, if I can tell them something they didn't know (or vice versa), then I generally consider it worthwhile.
I appreciate your contributions, as well as your concerns.
Leave a comment:
-
-
Originally posted by Gamer1227 View Post
Yeah, go tell the FFMPEG developers that hand writen assembly is stupid and that they should "just trust the compiler Bro!".
Image-rs is a exception, the exception reinforces the rule.
Look at all this "stupid" hand writen assembly.
https://github.com/FFmpeg/FFmpeg/tree/master/libavcodec
Leave a comment:
-
-
Originally posted by Gamer1227 View Postusing namespace std; void sum(vector<int64_t> a, vector<int64_t> b, vector<int64_t> sum) { assert(a.size() == b.size()); assert(a.size() == sum.size()); // remove the commmented if condition // and see the great auto vectorization crumble // works on both GCC and Clang! for (auto i = 0; i < a.size(); i++) { //if (a[i] < 5) { // a[i] = 0; //} sum[i] = a[i] + b[i]; } }
I wrote a small example of C++, just a function summing the elements of 2 vectors into a sum vector, also helped the compiler by asserting that the 3 vectors have the same size.
As you can see, whithout the simple if condition, it works well, if you hover the mouse over the sum[i] = a[i] + b[i], the assembly is SSE instructions with xmm registers.
But if you uncomment that condition, the compiler falls back to scalar instructions, and it is only 1 very simple if condition, imagine anything that is more complex.
Originally posted by Gamer1227 View PostThe only way to vectorize with the IF statement is to enable AVX2, wich have conditional instructions. But then your code would not run on CPUs without AVX2.
Also, using clang-19.1 + -march=nehalem gets it vectorized. I think clang is now the leader in autovectorization. I'd focus on it, if you want to see the state of the art.
Originally posted by Gamer1227 View PostOn hand written code, the solution is to create a scalar C path and a AVX2 path, the path would be chosen looking at CPU features. But auto vectorized code is not able to do it, it only generates 1 path.
Originally posted by Gamer1227 View Postbut again, it is a very simple IF statement,
You seem to have this odd idea that optimizing code is an all-or-nothing affair. That either the compiler should be all-knowing and do everything for you, without you having to lift a finger, or that you have to drop into assembly language and do everything yourself. In most other professions, tradesman learn to actually use their tools. The tool's job is to make your life easier, but you should master it, if you desire to extract the greatest benefit from it.
Like I said, I think the main reason a lot of people drop into assembly language is about ego. They enjoy the dopamine hit they get, when they believe they can do something better than the compiler. However, I've seen the generic C paths, in some of these projects, and they did absolutely nothing to help out the compiler and use it more effectively. A cynical take in this is that they actually don't want the generic version to be very fast, so that they can keep justifying their hand-written assembly language.Last edited by coder; 16 December 2024, 04:27 PM.
Leave a comment:
-
-
Originally posted by coder View PostEven that much is enough work that they had to train a bunch of people to do it. Then, when they want to optimize for a new architecture, they have to do it all over again. Also, what about when they want to support new decoder features, like 10-bit or 12-bit? You bet that's another code path!
Don't make light of work you didn't do.
We didn't say that. I won't speak for Quackdoc , but my point is that people should learn to use their tools properly.using namespace std; void sum(vector<int64_t> a, vector<int64_t> b, vector<int64_t> sum) { assert(a.size() == b.size()); assert(a.size() == sum.size()); // remove the commmented if condition // and see the great auto vectorization crumble // works on both GCC and Clang! for (auto i = 0; i < a.size(); i++) { //if (a[i] < 5) { // a[i] = 0; //} sum[i] = a[i] + b[i]; } }
I wrote a small example of C++, just a function summing the elements of 2 vectors into a sum vector, also helped the compiler by asserting that the 3 vectors have the same size.
As you can see, whithout the simple if condition, it works well, if you hover the mouse over the sum[i] = a[i] + b[i], the assembly is SSE instructions with xmm registers.
But if you uncomment that condition, the compiler falls back to scalar instructions, and it is only 1 very simple if condition, imagine anything that is more complex.
The only way to vectorize with the IF statement is to enable AVX2, wich have conditional instructions. But then your code would not run on CPUs without AVX2.
On hand written code, the solution is to create a scalar C path and a AVX2 path, the path would be chosen looking at CPU features. But auto vectorized code is not able to do it, it only generates 1 path.
but again, it is a very simple IF statement, even with AVX turned on, the compiler would probably fail with more complex code.
Leave a comment:
-
-
Originally posted by Gamer1227 View Postnot all code is vectorized, look at the source code of x264 decoder, they only vectorize some hot loops that can be vectorized, that is like 5% of code at worst. It is not that much work.
Don't make light of work you didn't do.
Originally posted by Gamer1227 View PostYeah, go tell the FFMPEG developers that hand writen assembly is stupid and that they should "just trust the compiler Bro!".
Leave a comment:
-
-
Originally posted by Quackdoc View Post
don't engage with such overly false statements. image-rs is insanely popular and found autovectorization to perform better then their hand written code https://github.com/image-rs/image-png/pull/512
auto vectorization is actually pretty decent now
Image-rs is a exception, the exception reinforces the rule.
Look at all this "stupid" hand writen assembly.
Leave a comment:
-
-
Originally posted by coder View Post
If you rely on everything to be vectorized by hand, the vast majority of software never will. That's why it pays for programmers to understand the capabilities and limitations of compiler autovectorization. It's a lot less work to write easily autovertorizable C/C++ code than it is to go the whole way and do it all by hand, not to mention a lot more portable and easier to maintain.
In these debates, I'm always struck by how people advocate for hand-vectorizing as if it has no downside, with opportunity cost being one glaring oversight. Even if we accept that it's indeed the gold standard, it's essentially saying you want like 1% of code that's near optimal and are content for the rest to sputter along as entirely scalar. If people would actually invest time in learning how to write autovectorizable code, then maybe we could have 5% or 10% of code being vectorized and it'd run fast on ARM and RISC-V (for the cores with vector extensions), not just x86.
Leave a comment:
-
Leave a comment: