Announcement

Collapse
No announcement yet.

Rust For Linux Kernel v9 Patches Trim Things Down Greatly For Easier Upstreaming

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ultimA
    replied
    Originally posted by mdedetrich View Post

    Do note that there are multiple definitions of "zero cost". One is that it is literally free (as is with the case of exceptions, exceptions are "free" as long as you never catch them) where as the other is "for the abstraction you are opting into which in this case is vtables/OO, the implementation is as efficient as possible". C++ generally follows the latter definition of "zero cost", so its not that OO is free but rather their implementation of OO is as fast as you can get.

    The definition in of-itself is a bit loose when you deal with compiler optimisations as you noted.
    Just to clarify, I meant zero-cost in the former sense, meaning not "as efficient as possible", but truly zero. The virtual call was really replaced by a static call (or got inlined directly) without a vtable lookup, hence why I said zero-cost "as far as the executed instructions are concerned". The vtable, and more importantly the vtable pointer in the object instances, still remained, and so I said this was not zero-cost in space, even if this might still be efficient. Whether this is actually a performance problem depends on the application. If the objects need to be streamed in and out of memory and the app is memory bound, or if this causes some issues with efficient cache usage, then the space overhead can of course cause a performance degradation too. Otherwise, only the space taken up by itself, is probably a non-issue except on small embedded platforms.

    Leave a comment:


  • ultimA
    replied
    Originally posted by NobodyXu View Post
    ultimA IMO this is because they want to retain the API/ABI stability, e.g. if this class is used in public API/ABI, then the compiler has to be careful when doing that.
    Yes, and that's the reason I tried stuff with anonymous namespaces and symbol visibility (also note I've been compiling an executable). ABI compatibility needn't be a concern here.

    Leave a comment:


  • NobodyXu
    replied
    ultimA IMO this is because they want to retain the API/ABI stability, e.g. if this class is used in public API/ABI, then the compiler has to be careful when doing that.

    It is definitely doable, but the compiler devs probably decide it's not worth their time, similar to how the compiler cannot optimize exception handling even when all its branches are known at compile/time.

    Leave a comment:


  • ssokolow
    replied
    Originally posted by mdedetrich View Post

    Do note that there are multiple definitions of "zero cost". One is that it is literally free (as is with the case of exceptions, exceptions are "free" as long as you never catch them) where as the other is "for the abstraction you are opting into which in this case is vtables/OO, the implementation is as efficient as possible". C++ generally follows the latter definition of "zero cost", so its not that OO is free but rather their implementation of OO is as fast as you can get.

    The definition in of-itself is a bit loose when you deal with compiler optimisations as you noted.
    The C++ people promote the term "zero-overhead abstractions" for that latter one these days

    Leave a comment:


  • mdedetrich
    replied
    Originally posted by ultimA View Post
    Ok, last night I went did some tests. Devirtualization is easily and reliably triggered under basically all circumstances I tried, so we have zero-cost there as far as the executed instructions are concerned. That's a plus. It turns out though I was wrong about the retention of vtables (and NobodyXu was correct) . Hence we are not zero-cost in space. I spent a couple of hours trying out everything possible to try convince compilers and linkers to remove vtables, but aside from an impractically trivial test case, this basically never happened. I tried various optimization options, LTO, turning off RTTI, linker options, tried with gcc and clang, ld, gold and ldd, played around with symbol hiding, inlining, linkage... really I tried out everything that I thought might help, but all for naught.
    Do note that there are multiple definitions of "zero cost". One is that it is literally free (as is with the case of exceptions, exceptions are "free" as long as you never catch them) where as the other is "for the abstraction you are opting into which in this case is vtables/OO, the implementation is as efficient as possible". C++ generally follows the latter definition of "zero cost", so its not that OO is free but rather their implementation of OO is as fast as you can get.

    The definition in of-itself is a bit loose when you deal with compiler optimisations as you noted.

    Leave a comment:


  • ultimA
    replied
    Ok, last night I went did some tests. Devirtualization is easily and reliably triggered under basically all circumstances I tried, so we have zero-cost there as far as the executed instructions are concerned. That's a plus. It turns out though I was wrong about the retention of vtables (and NobodyXu was correct) . Hence we are not zero-cost in space. I spent a couple of hours trying out everything possible to try convince compilers and linkers to remove vtables, but aside from an impractically trivial test case, this basically never happened. I tried various optimization options, LTO, turning off RTTI, linker options, tried with gcc and clang, ld, gold and ldd, played around with symbol hiding, inlining, linkage... really I tried out everything that I thought might help, but all for naught.

    I'm pretty sure this is a missed optimization opportunity, as in multiple cases it should be trivial to prove the vtable simply cannot be used. I guess this optimization is just not implemented. While I guess I understand why the linker won't remove vtables (to remove the overhead from object instances it would need to go back and retroactively change the memory layout of objects), I am surprised that the compiler insists on generating the vtable in all cases in the first place (even when unused and provably unusable by any other compilation units). Nevertheless - I thought I'd share my experiences here for everybody to learn: For virtual methods, the use of vtables can be optimized out no problem, but they will still be generated in basically every case.

    Leave a comment:


  • ssokolow
    replied
    Originally posted by Sergey Podobry View Post
    And you think Rust magically implements dynamic dispatch without adding extra data? You know you don't have to write kernel code with abstract factories and virtual inheritance everywhere. Don't know why but when people argue about C++ they think only about this kind of code.
    Rust uses a (&struct, &vtable) fat pointer in the places where you ask for dynamic dispatch (the jargon is "trait objects") and will never transparently insert a new member into the struct itself.

    Leave a comment:


  • NobodyXu
    replied
    Originally posted by ultimA View Post
    The compiler should eliminate the vtable if it can prove that it cannot be used by anything (not even by other compilation units or by dynamic linking). Mark the override with final, and hide the base class from other users (for example by making it use internal linkage) or compile the executable with LTO. The vtable should get removed (including of course the pointer to it in your object). This is why I stated earlier that for this to work in the kernel you would probably need LTO, though it would work even without LTO in cases where the interface is local to a single kernel module's implementation.

    I might do some tests later in the evening to confirm this. Even if the vtable stays, which I doubt right now, this would still make this technique zero-cost in execution time, just not zero-cost in space.
    To prove that, the compiler needs to inline every function that passes the objects of that type around.
    Hence I think it might not work that well in practice.

    Originally posted by ultimA View Post
    It is also worth mentioning that as least in the case of the kernel, this discussion is more theoretical than practical. As Sergey has pointed it out, the kernel modules already use dynamic dispatching (they need ti for dynamic module loading to work), so we wouldn't be losing any performance even if our abstraction wasn't zero-cost. But with the right code and compiler optimizations, it should be.
    Yeah, it would not cause any regression for the kernel, though it might for other scenarios.

    Leave a comment:


  • reba
    replied
    If a language is so complex and so flexible (in a bad way) that it is not possible for some experts to talk clearly to each other about what their code is intended to do... something went wrong at some point.

    Leave a comment:


  • ultimA
    replied
    Originally posted by NobodyXu View Post

    I agree that in these cases, you need vtables none the less.

    Though my point is just that virtual function in C++ is not zero-cost, since the objects have the 8-bytes vtable even if you don't use it.
    It can be optimized out, but that optimization isn't guarantee to happen.

    Consider the following code:

    Code:
    struct C {
    public virtual void f() {
    /* Contains so much code that compiler does not inline this function */
    }
    }
    
    void large_func(C &c) {
    /* Contains so much code that compiler does not inline this function */
    }
    In this case, the compiler would not be able to inline large_func and cannot inline C::f, so the compiler cannot optimize out the vtable pointer here.



    You misunderstand me again.

    I'm not saying that you cannot depend on optimization, I am simply saying that in C++, virtual function is not zero-cost.
    Optimization here does not change the fact that it isn't zero-cost as it cannot eliminate the unused vtable pointer in all scenarios.

    The compiler should eliminate the vtable if it can prove that it cannot be used by anything (not even by other compilation units or by dynamic linking). Mark the override with final, and hide the base class from other users (for example by making it use internal linkage) or compile the executable with LTO. The vtable should get removed (including of course the pointer to it in your object). This is why I stated earlier that for this to work in the kernel you would probably need LTO, though it would work even without LTO in cases where the interface is local to a single kernel module's implementation.

    I might do some tests later in the evening to confirm this. Even if the vtable stays, which I doubt right now, this would still make this technique zero-cost in execution time, just not zero-cost in space.

    It is also worth mentioning that as least in the case of the kernel, this discussion is more theoretical than practical. As Sergey has pointed it out, the kernel modules already use dynamic dispatching (they need ti for dynamic module loading to work), so we wouldn't be losing any performance even if our abstraction wasn't zero-cost. But with the right code and compiler optimizations, it should be.
    Last edited by ultimA; 08 August 2022, 08:39 AM.

    Leave a comment:

Working...
X