SiFive HiFive Premier P550 RISC-V Price Lowered, Ubuntu 24.04 Support Ready

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quackdoc
    replied
    Originally posted by dev_null View Post
    No source code for the firmware, right ? https://github.com/sifive/hifive-pre...ools/issues/11 Why it's better than any other board then ?
    for 99% of people it's not.

    Leave a comment:


  • dev_null
    replied
    No source code for the firmware, right ? https://github.com/sifive/hifive-pre...ools/issues/11 Why it's better than any other board then ?

    Leave a comment:


  • coder
    replied
    Originally posted by Quackdoc View Post
    He is spewing massively blatant lies about what I said, I think it's safe to ignore him as a lunatic.
    The Godbolt example seemed like a good-faith effort to advance the discussion. Other reasonable points were raised, as well. As long as I feel the discussion continues to turn up interesting and useful information, I'm likely to continue.

    I don't expect I can change anyone's mind in a single exchange. However, if I can tell them something they didn't know (or vice versa), then I generally consider it worthwhile.

    I appreciate your contributions, as well as your concerns.

    Leave a comment:


  • Quackdoc
    replied
    Originally posted by coder View Post
    We didn't say that. I won't speak for Quackdoc , but my point is that people should learn to use their tools properly.​
    He is spewing massively blatant lies about what I said, I think it's safe to ignore him as a lunatic.

    Leave a comment:


  • Quackdoc
    replied
    Originally posted by Gamer1227 View Post

    Yeah, go tell the FFMPEG developers that hand writen assembly is stupid and that they should "just trust the compiler Bro!".

    Image-rs is a exception, the exception reinforces the rule.

    Look at all this "stupid" hand writen assembly.
    https://github.com/FFmpeg/FFmpeg/tree/master/libavcodec
    bro, I said 1cm and you took it as 100km

    Leave a comment:


  • coder
    replied
    Originally posted by Gamer1227 View Post
    using namespace std; void sum(vector<int64_t> a, vector<int64_t> b, vector<int64_t> sum) { assert(a.size() == b.size()); assert(a.size() == sum.size()); // remove the commmented if condition // and see the great auto vectorization crumble // works on both GCC and Clang! for (auto i = 0; i < a.size(); i++) { //if (a[i] < 5) { // a[i] = 0; //} sum[i] = a[i] + b[i]; } }


    I wrote a small example of C++, just a function summing the elements of 2 vectors into a sum vector, also helped the compiler by asserting that the 3 vectors have the same size.

    As you can see, whithout the simple if condition, it works well, if you hover the mouse over the sum[i] = a[i] + b[i], the assembly is SSE instructions with xmm registers.

    But if you uncomment that condition, the compiler falls back to scalar instructions, and it is only 1 very simple if condition, imagine anything that is more complex.
    You failed to specify a -march option. If you simply add -march=sapphirerapids, it will vectorize that.

    Originally posted by Gamer1227 View Post
    ​The only way to vectorize with the IF statement is to enable AVX2, wich have conditional instructions. But then your code would not run on CPUs without AVX2.
    It vectorizes with -march=sandybridge, which lacks AVX2.

    Also, using clang-19.1 + -march=nehalem gets it vectorized. I think clang is now the leader in autovectorization. I'd focus on it, if you want to see the state of the art.

    Originally posted by Gamer1227 View Post
    ​​On hand written code, the solution is to create a scalar C path and a AVX2 path, the path would be chosen looking at CPU features. But auto vectorized code is not able to do it, it only generates 1 path.
    With a modicum of creativity, you can figure out how to use the preprocessor, macros, and the #include directive to utilize GCC's function multi-versioning feature to achieve the same effect, while writing the function only once.

    Originally posted by Gamer1227 View Post
    ​​but again, it is a very simple IF statement,
    Yes, and you did almost nothing to help the compiler. Like I said, using built-ins or PGO will give it enough information to know how likely certain codepaths are. Using __restrict__ will help it avoid having to assume certain variables might alias.

    You seem to have this odd idea that optimizing code is an all-or-nothing affair. That either the compiler should be all-knowing and do everything for you, without you having to lift a finger, or that you have to drop into assembly language and do everything yourself. In most other professions, tradesman learn to actually use their tools. The tool's job is to make your life easier, but you should master it, if you desire to extract the greatest benefit from it.

    Like I said, I think the main reason a lot of people drop into assembly language is about ego. They enjoy the dopamine hit they get, when they believe they can do something better than the compiler. However, I've seen the generic C paths, in some of these projects, and they did absolutely nothing to help out the compiler and use it more effectively. A cynical take in this is that they actually don't want the generic version to be very fast, so that they can keep justifying their hand-written assembly language.
    Last edited by coder; 16 December 2024, 04:27 PM.

    Leave a comment:


  • Gamer1227
    replied
    Originally posted by coder View Post
    Even that much is enough work that they had to train a bunch of people to do it. Then, when they want to optimize for a new architecture, they have to do it all over again. Also, what about when they want to support new decoder features, like 10-bit or 12-bit? You bet that's another code path!

    Don't make light of work you didn't do.


    We didn't say that. I won't speak for Quackdoc , but my point is that people should learn to use their tools properly.​
    using namespace std; void sum(vector<int64_t> a, vector<int64_t> b, vector<int64_t> sum) { assert(a.size() == b.size()); assert(a.size() == sum.size()); // remove the commmented if condition // and see the great auto vectorization crumble // works on both GCC and Clang! for (auto i = 0; i < a.size(); i++) { //if (a[i] < 5) { // a[i] = 0; //} sum[i] = a[i] + b[i]; } }


    I wrote a small example of C++, just a function summing the elements of 2 vectors into a sum vector, also helped the compiler by asserting that the 3 vectors have the same size.

    As you can see, whithout the simple if condition, it works well, if you hover the mouse over the sum[i] = a[i] + b[i], the assembly is SSE instructions with xmm registers.

    But if you uncomment that condition, the compiler falls back to scalar instructions, and it is only 1 very simple if condition, imagine anything that is more complex.

    The only way to vectorize with the IF statement is to enable AVX2, wich have conditional instructions. But then your code would not run on CPUs without AVX2.

    On hand written code, the solution is to create a scalar C path and a AVX2 path, the path would be chosen looking at CPU features. But auto vectorized code is not able to do it, it only generates 1 path.

    but again, it is a very simple IF statement, even with AVX turned on, the compiler would probably fail with more complex code.

    Leave a comment:


  • coder
    replied
    Originally posted by Gamer1227 View Post
    not all code is vectorized, look at the source code of x264 decoder, they only vectorize some hot loops that can be vectorized, that is like 5% of code at worst. It is not that much work.
    Even that much is enough work that they had to train a bunch of people to do it. Then, when they want to optimize for a new architecture, they have to do it all over again. Also, what about when they want to support new decoder features, like 10-bit or 12-bit? You bet that's another code path!

    Don't make light of work you didn't do.

    Originally posted by Gamer1227 View Post
    Yeah, go tell the FFMPEG developers that hand writen assembly is stupid and that they should "just trust the compiler Bro!".
    We didn't say that. I won't speak for Quackdoc , but my point is that people should learn to use their tools properly.​

    Leave a comment:


  • Gamer1227
    replied
    Originally posted by Quackdoc View Post

    don't engage with such overly false statements. image-rs is insanely popular and found autovectorization to perform better then their hand written code https://github.com/image-rs/image-png/pull/512

    auto vectorization is actually pretty decent now
    Yeah, go tell the FFMPEG developers that hand writen assembly is stupid and that they should "just trust the compiler Bro!".

    Image-rs is a exception, the exception reinforces the rule.

    Look at all this "stupid" hand writen assembly.
    Mirror of https://git.ffmpeg.org/ffmpeg.git. Contribute to FFmpeg/FFmpeg development by creating an account on GitHub.

    Leave a comment:


  • Gamer1227
    replied
    Originally posted by coder View Post

    If you rely on everything to be vectorized by hand, the vast majority of software never will. That's why it pays for programmers to understand the capabilities and limitations of compiler autovectorization. It's a lot less work to write easily autovertorizable C/C++ code than it is to go the whole way and do it all by hand, not to mention a lot more portable and easier to maintain.

    In these debates, I'm always struck by how people advocate for hand-vectorizing as if it has no downside, with opportunity cost being one glaring oversight. Even if we accept that it's indeed the gold standard, it's essentially saying you want like 1% of code that's near optimal and are content for the rest to sputter along as entirely scalar. If people would actually invest time in learning how to write autovectorizable code, then maybe we could have 5% or 10% of code being vectorized and it'd run fast on ARM and RISC-V (for the cores with vector extensions), not just x86.
    not all code is vectorized, look at the source code of x264 decoder, they only vectorize some hot loops that can be vectorized, that is like 5% of code at worst. It is not that much work.

    Leave a comment:

Working...
X