Announcement

Collapse
No announcement yet.

Uutils 0.0.23 Implements More GNU Coreutils Functionality In Rust

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by ssokolow View Post

    Rust doesn't have a VM or a heavy runtime, but it does have a runtime... the same kind C does. Microsoft even talks about the CRT (C Run Time) in their docs.
    But only if you choose to. You don't need that for embedded usage is you don't have an operating system.

    Comment


    • #32
      Originally posted by Akiko View Post

      Sometimes I really wonder if people actually read my message completely. I said I know this is possible, but if you do this, you drop most of what makes Rust secure and with that you drop the reason to use Rust. Then you can stick to C and still get smaller binaries. And btw you go right for the anecdotal fallacy, it doesn't matter if you care, this is not about personal needs, this is about generic embedded needs. And embedded sometimes has serious size restrictions and Rust does not fit these as well as C does. It just doesn't matter if your binaries are more secure if you can't put them in the available storage. And like I said, I also prefer Rust, but there are just places where using it just does not work.
      First of all, no, It matters what I need, because I use what I use, I don't use what everyone else uses, this is not an anecdotal fallacy, this is realizing that different people need different things. For me, it would be very adequate to be the ultimate busybox replacement in nearly all of my projects.

      Secondly, I'm not sure what all you think you are dropping? yes you do drop, a lot, but rust has, a lot of a lot to offer, you still have compile time protections for a lot of things, yes you have to be more careful in some places, but pretending like this isn't still way better then C is completely absurd. even when using no_std, Rust still still a lot safer then C.​

      Originally posted by ClosedSource View Post
      No sane person wastes money auditing 71 external libraries.
      I literally said I don't need to, I don't need to audit "time" or "blake3" Next time at least pretend to read what I say.​ I simply worked it down to 71 crates, of which I know there is at least one or two crates there that are subcrates of the same project. this was just a quick experiment to showcase how the "thousands of dependencies" is very misleading in how they can be counted.

      Originally posted by moltonel View Post

      {gnu,uutils}-coreutils prioritize performance and features over install size, while busybox does the opposite. At the very least, coreutils are not a feature-wise drop-in replacement of busybox, both because busybox commands have much fewer options and because busybox has many more commands (shell, text editor, network utils, file compression, etc) that are outside the scope of​ coreutils.
      ...
      My busybox install (using my distro's default busybox config) is 1.4M. Gnu-coreutils is 18M, uutils-coreutils is 10M, and you'd probably need at least 5 times that to add bash, util-linux, nano, tar, wget, and whatever else. UPX and linking tricks won't cut it, to compete with busybox's size you need a radically different approach. You could certainly try that in Rust, but not with a project that has "drop-in gnu-coreutils replacement" as its headline feature.
      again lets be clear, I am talking about my needs in case that wasn't clear, I realize uutils is still fairly large, and that it's does have a good chunk of performance, and ofc it doesn't have feature compatibility, (hence why I said expanded to). but take for instance, when I can allot 2Mb to uutils, anything lower then that is just extra benefit. Currently on my linux install, I use a UPXd uutils and it sits at about 1.6M, yeah it is for sure a bit fat. but if we can make uutils no_std compatible, that will significantly lower the binary size. And I'm NOT saying it's free, it could need extensive work, but IMO that work could be very well worth it​. also I don't really think that a lot of the features would really need too much extra space. ofc it would for sure need some, but with some elbow grease put into managing the file size, I think it could very well be doable.

      Comment


      • #33
        Originally posted by Quackdoc View Post
        when I can allot 2Mb to uutils, anything lower then that is just extra benefit. Currently on my linux install, I use a UPXd uutils and it sits at about 1.6M, yeah it is for sure a bit fat.
        Fair enough, 1.6M is quite comparable with busybox (although there's a lot less functionality in uutils).

        but if we can make uutils no_std compatible, that will significantly lower the binary size. And I'm NOT saying it's free, it could need extensive work, but IMO that work could be very well worth it​. also I don't really think that a lot of the features would really need too much extra space. ofc it would for sure need some, but with some elbow grease put into managing the file size, I think it could very well be doable.
        I don't think it makes sense to make a no_std uutils : most of coreutils by nature interacts with OS resources, starting with opening files, so it's clearly a project that needs std. You also mentioned earlier using a statically built gnu libc; that's notoriously unsupported/impossible, most people reach for musl if they want a static libc. If you have multiple binaries you're probably best off with a dynamic gnu libc, to share the cost. With musl or with one of the pure-rust libc replacements, you might be able to shave off code during linking. You'll probably want to -Zbuild-std for best effect.

        For me, it would be very adequate to be the ultimate busybox replacement in nearly all of my projects.
        Uutils will never be a busybox replacement. Not because of binary size but because of feature scope. Achieving perfect compatibility with a beloved but decades-old and crufty set of utilities is hard enough.

        As I said earlier, maybe look at nushell: you get the shell that's missing from uutils, and a wide array of builtins, many of which are now implemented using uutils crates. I don't know if that's the ultimate busybox replacement for you yet, but it'll get you closer than uutils alone.

        Comment


        • #34
          Originally posted by moltonel View Post
          I don't think it makes sense to make a no_std uutils : most of coreutils by nature interacts with OS resources, starting with opening files, so it's clearly a project that needs std. You also mentioned earlier using a statically built gnu libc; that's notoriously unsupported/impossible, most people reach for musl if they want a static libc. If you have multiple binaries you're probably best off with a dynamic gnu libc, to share the cost. With musl or with one of the pure-rust libc replacements, you might be able to shave off code during linking. You'll probably want to -Zbuild-std for best effect.
          Rust actually makes statically linking it really easy, ofc I don't reccomend it whenever possible (there are edge cases where some rust stuff doesnt work with musl and you need to) because of the large size of it. I am actually looking forwards to to that new eyra project, seems like it has a lot of potential, alibiet linux only (not an issue for me happily). I wonder how much build-std actually would save. I would like to try this sometime for sure

          Uutils will never be a busybox replacement. Not because of binary size but because of feature scope. Achieving perfect compatibility with a beloved but decades-old and crufty set of utilities is hard enough.

          As I said earlier, maybe look at nushell: you get the shell that's missing from uutils, and a wide array of builtins, many of which are now implemented using uutils crates. I don't know if that's the ultimate busybox replacement for you yet, but it'll get you closer than uutils alone.
          unfortunately nushell is just too much of a hassle to work around, it's builtins are annoying and anyone familar with bash will feel very lost, even simple things like loops feel very foreign. I do think, maybe not uutils themselves, maybe a project building off them could be very fruitful, a lot of crates already exist that get you pretty close to a busybox replacement, but ofc having all of them seperate binaries hurts (and there still is no rust shell that is remotely bourne compatible IIRC).

          Comment


          • #35
            Originally posted by oleid View Post

            But only if you choose to. You don't need that for embedded usage is you don't have an operating system.
            *nod* ...though Rust tries to be more granular, with no_core being equivalent to turning off the C runtime and no_std probably being what you want for embedded usage.

            (std depends on userland OS services like having a filesystem, threads, etc. and it also depends on alloc, which contains things like the built in heap-allocated collection types. Both of them build on core which provides the parts that can operate on a bare-metal platform with only a stack. If you're using something that provides heap allocation but no "userland", you can go no_std plus alloc.)
            Last edited by ssokolow; 17 November 2023, 08:04 PM.

            Comment


            • #36
              Originally posted by ssokolow View Post

              *nod* ...though Rust tries to be more granular, with no_core being equivalent to turning off the C runtime and no_std probably being what you want for embedded usage.

              (std depends on userland OS services like having a filesystem, threads, etc. and it also depends on alloc, which contains things like the built in heap-allocated collection types. Both of them build on core which provides the parts that can operate on a bare-metal platform with only a stack. If you're using something that provides heap allocation but no "userland", you can go no_std plus alloc.)
              you can pretty much fully replace the std library with individual crates picking and choosing what you need, it for sure is more work. even if you pull your own allocator and collections, if thats all you need, you still get a massive space savings. so even for an OS that can help a lot

              Comment


              • #37
                Originally posted by Akiko View Post
                Even if you go for basically all possible options to produce the smallest possible statically compiled binary, a simple hello world example still ends up to be more than 1 MiB on amd64. This is an absolute no-go for such a tool.

                And if you go for using the c-runtime instead of the Rust-runtime you still end up with much bigger binaries minus the security Rust provides. Rust for sure is a good modern and secure language, but that comes with a price.
                Bit interested in your comparison. Not really any different compared to a static build of hello world with C or C++? (all sizes are from du -bh file_name):

                Code:
                # Hello World C
                
                $ cat > hello-c.c <<EOF
                #include <stdio.h>
                int main() {
                    printf("Hello, World!\n");
                }
                EOF
                
                # 16K
                # du -bh hello-c
                $ make hello-c
                $ file hello-c
                hello-c: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=29766d6064af1d08935550a2e9e30f8681f817c1, for GNU/Linux 3.2.0, not stripped
                $ ldd hello-c
                        linux-vdso.so.1 (0x00007fff7e3f3000)
                        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe396fa7000)
                        /lib64/ld-linux-x86-64.so.2 (0x00007fe3971db000)
                
                # 789K (13.3K musl):
                # static build
                $ rm hello-c && make hello-c CFLAGS='-Os -flto -Wl,--gc-sections -static -s'
                $ file hello-c
                hello-c: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=1db5f12d51e7c7180f130323c0b4713d4818cf84, for GNU/Linux 3.2.0, stripped
                $ ldd hello-c
                        not a dynamic executable​
                Code:
                # Hellow World C++
                
                $ cat > hello-cpp.cpp <<EOF
                #include <iostream>
                using namespace std;
                int main() {
                    cout << "Hello, World!" << endl;
                }
                EOF
                
                # 17K:
                $ make hello-cpp
                $ file hello-cpp
                hello-cpp: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=cd91c7308aaebe35a684726085885915627eb8ee, for GNU/Linux 3.2.0, not stripped
                $ ldd hello-cpp
                        linux-vdso.so.1 (0x00007ffc0f78f000)
                        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f3e3fd0d000)
                        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3e3fae5000)
                        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3e3f9fe000)
                        /lib64/ld-linux-x86-64.so.2 (0x00007f3e3ff45000)
                        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f3e3f9de000)
                
                # 1M (538K musl):
                # static build
                $ make hello-cpp CXXFLAGS='-Os -flto -Wl,--gc-sections -static -static-libstdc++ -s'
                $ file hello-cpp
                hello-cpp: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=0e89c1a796b1871b8ebdd55ed14951d159041dc6, for GNU/Linux 3.2.0, stripped
                $ ldd hello-cpp
                        not a dynamic executable​
                Code:
                # Hello World Rust (glibc dynamically linked builds)
                
                # `cargo init` will create src/main.rs for you with:
                # fn main() {
                #     println!("Hello, world!");
                # }​
                $ cargo init
                
                # 4.5M (musl static: 4.6M):
                # Standard release build
                $ cargo build --target x86_64-unknown-linux-gnu --release
                
                # 367K (musl static: 454K):
                # Strip binary
                $ echo -e '[profile.release]\nstrip = true' >> Cargo.toml
                # 319k (musl: 410K):
                # Optimize for size
                echo -e 'lto = true\nopt-level = "z"' >> Cargo.toml
                
                # 207K (musl static: 290K):
                # Optimize libstd with build-std (now requires nightly toolchain):
                # https://github.com/johnthagen/min-sized-rust#optimize-libstd-with-build-std
                $ rustup toolchain install nightly
                $ rustup component add rust-src --toolchain nightly
                $ echo 'panic = "abort"' >> Cargo.toml
                $ RUSTFLAGS="-Zlocation-detail=none" cargo +nightly build -Z build-std=std,panic_abort --target x86_64-unknown-linux-gnu --release
                
                # 31K (musl static: 50K):
                # Using panic_immediate_abort:
                # https://github.com/johnthagen/min-sized-rust#remove-panic-string-formatting-with-panic_immediate_abort​
                $ RUSTFLAGS="-Zlocation-detail=none" cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-gnu --release
                
                # Result is 31K (but glibc is dynamically linked by default):
                $ file target/x86_64-unknown-linux-gnu/release/hello_world
                target/x86_64-unknown-linux-gnu/release/hello_world: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=e1ccad0eec26338e8c9f329c1a08c4ab60fb4810, for GNU/Linux 3.2.0, stripped
                $ ldd target/x86_64-unknown-linux-gnu/release/hello_world
                        linux-vdso.so.1 (0x00007fffe677b000)
                        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f877d19c000)
                        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f877cf74000)
                        /lib64/ld-linux-x86-64.so.2 (0x00007f877d1cc000)
                
                # 945K with additional RUSTFLAGS `-C relocation-model=static -Ctarget-feature=+crt-static`
                $ file target/x86_64-unknown-linux-gnu/release/hello_world
                target/x86_64-unknown-linux-gnu/release/hello_world: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=4b68f4e3f9f54f2ea2cf35043efd9876473a8da9, for GNU/Linux 3.2.0, stripped
                $ ldd target/x86_64-unknown-linux-gnu/release/hello_world
                        not a dynamic executable​
                More variation in rust build sizes to show iterative optimization in builds, some ergonomics start to get sacrificed but not too much.

                Still comparable to C++ except for almost double the dynamically linked size, +14K is hardly significant. The static builds look more favorable to rust when built with the musl target.

                While not that practical due to sacrificing ergonomics, here is a `no_std` variant which is only 13.1K statically linked:

                Code:
                # 13.1K (13,376 bytes) static linked (dynamic linked adds 400 bytes on musl, 800 bytes on glibc)
                $ RUSTFLAGS="-C relocation-model=static -C target-feature=+crt-static" cargo build --target x86_64-unknown-linux-musl --release
                # src/main.rs changed to no_std:
                # Ref: https://www.reddit.com/r/rust/comments/bf8l2b/comment/elbzd5h/
                ```rs
                #![no_std]
                #![no_main]
                
                #[no_mangle]
                pub extern "C" fn main() -> isize {
                    const HELLO: &'static str = "Hello, world!\n";
                    unsafe { write(1, HELLO.as_ptr() as *const i8, HELLO.len()) };
                    0
                }
                
                #[link(name = "c")]
                extern "C" {
                    fn write(fd: i32, buf: *const i8, count: usize) -> isize;
                }
                
                #[panic_handler]
                fn panic(_info: &core::panic::PanicInfo) -> ! { loop {} }
                ```
                ​
                Last edited by polarathene; 19 November 2023, 08:41 PM.

                Comment


                • #38
                  Originally posted by Quackdoc View Post

                  This is not an issue for me, even if I cared about a couple extra mb, which I really couldn't care less, the platforms I target work with upx anyways which brings uutils from 6.4Mb (after stripping) down to 1.9Mb using a fairly conservative
                  Code:
                  RUSTFLAGS='-C target-feature=+crt-static -C opt-level=z -C panic=abort' cargo build --target=x86_64-unknown-linux-gnu --release
                  this is more then good enough for me as it's not too far off busybox-static
                  Originally posted by Quackdoc View Post
                  I am actually looking forwards to to that new eyra project, seems like it has a lot of potential, albeit linux only (not an issue for me happily). I wonder how much build-std actually would save.
                  I managed to get `uutils` to compile to 5.5M, it builds with their `release-small` build profile in `Cargo.toml` (includes `strip=true`, `opt-level=z` and `panic=abort`):

                  Code:
                  # 6M (dynamically linked):
                  cargo +nightly build --target x86_64-unknown-linux-gnu --profile release-small --features unix
                  # 7.2M (statically linked):
                  $ RUSTFLAGS="-C target-feature=+crt-static" cargo +nightly build --target x86_64-unknown-linux-gnu --profile release-small --features unix
                  
                  # 5.5M (2M compressed with UPX) static build using `-Z build-std`:
                  $ RUSTFLAGS="-Z location-detail=none -C relocation-model=static -C target-feature=+crt-static" cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-gnu --profile release-small --features unix
                  
                  # 4M with musl (1.4M compressed with UPX):
                  $ RUSTFLAGS="-Z location-detail=none -C relocation-model=static -C target-feature=+crt-static" cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-musl --profile release-small --features feat_os_unix_musl
                  With eyra follows below

                  Originally posted by V1tol View Post

                  I wonder if something like eyra could help there. I am still wondering why Rust stdlib decided to use libc instead of implementing everything they need using syscalls (like Go does).
                  It fails to build with `--features unix`, so here is a comparison without that:

                  Code:
                  # First for reference the `-gnu` target (glibc):
                  # 5.2M (dynamically linked) - 6.4M if built static with `-C target-feature=+crt-static`:
                  $ cargo +nightly build --target x86_64-unknown-linux-gnu --profile release-small
                  # 3.6M (dynamically linked, optimized with -Z build-std):
                  $ RUSTFLAGS="-Z location-detail=none -C relocation-model=static" cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-gnu --profile release-small
                  # 4.8M (static build, optimized with -Z build-std):
                  $ RUSTFLAGS="-Z location-detail=none -C relocation-model=static -C target-feature=+crt-static" cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-gnu --profile release-small
                  
                  # 3.7M (static build with eyra):
                  $ RUSTFLAGS="-C link-arg=-nostartfiles -Z location-detail=none -C relocation-model=static -C target-feature=+crt-static" cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-gnu --profile release-small
                  
                  # 3.7M (static build with musl)
                  $ RUSTFLAGS="-Z location-detail=none -C relocation-model=static -C target-feature=+crt-static" cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-musl --profile release-small
                  If of interest to compare to previous Hello World examples (these are with the same `Cargo.toml` release build additions):

                  Code:
                  # 406K:
                  # With eyra quickstart (using RUSTFLAGS arg instead of a separate `build.rs` file)
                  # https://github.com/sunfishcode/eyra/#quick-start
                  $ cargo add eyra --rename=std
                  $ RUSTFLAGS="-C link-arg=-nostartfiles" cargo +nightly build --target x86_64-unknown-linux-gnu --release
                  
                  # 38K (with `-Z build-std`):
                  # https://github.com/sunfishcode/eyra/#compatibility-with--zbuild-std
                  $ cargo remove std && cargo add eyra
                  $ echo 'extern crate eyra;' | cat - src/main.rs | sponge src/main.rs
                  # https://github.com/sunfishcode/eyra/tree/main/example-crates/hello-world-small#readme
                  # Omitted from RUSTFLAGS `-Z location-detail=none -C relocation-model=static -C target-feature=+crt-static` as they only reduced size by approx 500 bytes
                  $ RUSTFLAGS="-C link-arg=-nostartfiles" cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-gnu --release​
                  For comparison, musl builds 50K (static) or 22K (dynamic) linked builds (via `rust:alpine` Docker container and `musl-dev` package).
                  Last edited by polarathene; 19 November 2023, 11:22 PM.

                  Comment


                  • #39
                    Originally posted by Akiko View Post
                    I know this is possible, but if you do this, you drop most of what makes Rust secure and with that you drop the reason to use Rust.

                    ...

                    embedded sometimes has serious size restrictions and Rust does not fit these as well as C does. It just doesn't matter if your binaries are more secure if you can't put them in the available storage. And like I said, I also prefer Rust, but there are just places where using it just does not work.
                    Switching to `no_std` isn't as bad as you imply? You still get many benefits of Rust and compared to my experience with C (with no std lib for some MCU work) in the past, Rust was nicer to work with. That said if you're not as resource constrained, such as with an ESP32 you can still enjoy std.​

                    If you use Rust for embedded, there's plenty of support around that too.

                    From the embedded rust book example, here's a hello world example which compiles down to 2.2K

                    Code:
                    //! Prints "Hello, world!" on the host console using semihosting
                    
                    #![no_main]
                    #![no_std]
                    
                    use panic_halt as _;
                    
                    use cortex_m_rt::entry;
                    use cortex_m_semihosting::{debug, hprintln};
                    
                    #[entry]
                    fn main() -> ! {
                        hprintln!("Hello, world!").unwrap();
                    
                        // exit QEMU
                        // NOTE do not run this on hardware; it can corrupt OpenOCD state
                        debug::exit(debug::EXIT_SUCCESS);
                    
                        loop {}
                    }​
                    A more realistic example for embedded hello world that compiles to 2.7K.
                    • How much of a size concern is that 2.7K for you vs what you'd get with C?
                    • What disadvantages have you run into with Rust and no_std that you wouldn't experience with C?
                    • Do you feel that using no_std with Rust really makes C a better choice?
                    Personally, I found Rust after working with embedded in C and not finding C++ at all appealing to work with. I wanted a better DX and Rust provides that IMO. Skim that embedded book link sidebar for an idea of what Rust offers for embedded work.

                    Comment


                    • #40
                      Originally posted by polarathene View Post
                      -- SNIP --
                      I find the eyra results interesting, on one hand, I realize that eyra is still a very early project, on the other hand, a little bit of me at least was disappointed to hear the unix config doesn't work properly. ofc on the other, Im quite surprised that eyra is able to achieve those file sizes with how early it is. thats some impressive work. Thanks for updating us, I for sure will be following eyra now

                      Comment

                      Working...
                      X