Announcement

**NobodyXu** · 10 April 2023, 11:15 AM

Originally posted by ryao View Post

I was comparing Rust against C + various sanitizers such that the C program will be aborted if there is a memory safety issue. Even without the sanitizers, many memory safety issues also result in aborting a program. Aborting a program is a mitigation, but it is not a silver bullet and is rarely an acceptable solution since the program aborting is still a bug.

Originally posted by ryao View Post

A plane crashing or a nuclear power plant having a meltdown because aborts occurred in every redundant safety system is just not acceptable, but that is the Rust way of making code “memory safe” for a number of memory safety issues that it claims to avoid. Giving Rust partial credit in such situations makes no sense.

It's not a silver bullet but it is a good default behavior, having abort by default is better than having a memory corruption.
You can do the same in Rust using the `get` API I listed https://doc.rust-lang.org/std/primit...tml#method.get and then easily recover from that.

Sadly, it just occurs that there are a lot of memory bugs that will silently corrupt the program without terminating it.
It occurs in iOS 16.4 recently, occurs a lot before.
And it's even harder to catch that in kernel at runtime.

Apple actually does a lot inside kernel, by separating control data allocation from pure data, and splitting allocation pool for different types.

Originally posted by ryao View Post

gpsd for example is written in C and is designed to never abort since aborting can result in people dying since they need gpsd to work to be able to avoid being lost in the wilderness.

Are you referring to this program https://gpsd.gitlab.io/gpsd/ ?
If it crashes, why don't you just restart it, same as any other program?

Even Google Map can crash frequently, I don't see anybody directly killing by its crashes.

It's running as a client for GPS on the mobile phone and other devices, not some critical services running on the satelite.

Originally posted by ryao View Post

I suggest going back and rereading what I wrote, since you seem to have missed that I was comparing C augmented with techniques for either eliminating all memory safety issues or turning memory safety issues into aborts to Rust.

Can you elaborate on what static analysis that can eliminate all memory bugs inside selected pieces of code (excluding "unsafe" code, like inline assembly, implementation of list, vec, etc), I really want to look into that, thank you.

Originally posted by ryao View Post

Rust performs better than C with sanitizers that turn memory safety issues into aborts since it does not need to do as many runtime checks,

Nobody can bear the cost of running C with sanitizers in release build.

Originally posted by ryao View Post

The real solution is not yet another language,

Rust already proves itself a worthy candidate, why not adopt it?

Originally posted by ryao View Post

but to develop open source sound static analyzers according to the theory of abstract interpretation that have full support for concurrency, recursion and dynamic memory;
and can prove the absence of all memory safety issues, undefined behavior and runtime conditions that can trip assertions.

There cannot be a tool that can provide 100% safety while accepting any input.
That is just not impossible.

How do you prove inline assembly to be safe?
How do you prove syscalls to be safe?

And how do you write code that can handle any runtime condition?
That's simply not possible, at one time or another it will end up aborting because the whole program is broken beyond repair.

For a static analyzers to verify C code, it will have to go through all code path because C code provides no lifetime annotation (information).
Thus, it has to do the same thing as sanitizer (miri) by emulating through all code and it also has to go through all paths.

That will take really long for anything really large, such as system for avation, not to mention its feasibility.

**ryao** · 10 April 2023, 12:23 PM

Originally posted by NobodyXu View Post

Are you referring to this program https://gpsd.gitlab.io/gpsd/ ?
If it crashes, why don't you just restart it, same as any other program?

Then it could reach the same state it had previously and crash in a loop. How is that better?

Originally posted by NobodyXu View Post

Even Google Map can crash frequently, I don't see anybody directly killing by its crashes.

If gpsd crashes, your device does not know where it is. Restarting it is not a full solution since the crash can just occur on every restart.

Originally posted by NobodyXu View Post

Can you elaborate on what static analysis that can eliminate all memory bugs inside selected pieces of code (excluding "unsafe" code, like inline assembly, implementation of list, vec, etc), I really want to look into that, thank you.

Here is the best one as far as I can tell. It is heavily used in aviation and nuclear power, and is proprietary, but you can get a 30 day trial:

Astrée Static Analyzer for C and C++

https://www.absint.com/astree/index.htm

Astrée is a static program analyzer that proves the absence of runtime errors and invalid concurrent behavior in safety-critical applications written or generated in C or C++

Here is another proprietary option that competes with it:

Polyspace Code Prover

https://www.mathworks.com/products/polyspace-code-prover.html

Polyspace Code Prover proves the absence of run-time errors in handwritten and generated source code without requiring you to execute the code.

Here is an open source option developed by NASA, but it is the least capable of these tools, since it lacks concurrency support:

GitHub - NASA-SW-VnV/ikos: Static analyzer for C/C++ based on the theory of Abstract Interpretation.

https://github.com/NASA-SW-VnV/ikos

Static analyzer for C/C++ based on the theory of Abstract Interpretation. - NASA-SW-VnV/ikos

I could go on, but these are the most notable ones based on what I currently know.

Originally posted by NobodyXu View Post

Nobody can bear the cost of running C with sanitizers in release build.

A number of non-performance critical applications can. If crashing to avoid a memory issue is acceptable behavior, then they can be run with sanitizers.

Originally posted by NobodyXu View Post

Rust already proves itself a worthy candidate, why not adopt it?

Doing a rewrite in a new language is a recipe to have even more problems than you already have, no matter what the new language claims to do. There are so many naive mistakes that you make when writing something new that it just is never worth it unless the code base is so bad that the time spent fixing it would be greater than the time spent doing a rewrite. The result of an unnecessary rewrite is worse than we can achieve in C with the right tools, so there really is no reason to do it.

**ryao** · 10 April 2023, 12:32 PM

Originally posted by NobodyXu View Post

There cannot be a tool that can provide 100% safety while accepting any input.
That is just not impossible.

You change your code whenever a sound static analysis tool complains. When it stops complaining, the tool has proven the code is free of the issues it is designed to detect.

Originally posted by NobodyXu View Post

How do you prove inline assembly to be safe?

You can prove memory safety for it via abstract interpretation just like you can for any other language, although I am not aware of any tools that do it. In-line assembly is usually out of scope for tools that focus on high level languages.

Originally posted by NobodyXu View Post

How do you prove syscalls to be safe?

This really depends on what you mean by safe. I suspect you do not mean just memory safety. If you do mean just memory safety, then run a sound static analysis tool on the kernel.

Originally posted by NobodyXu View Post

And how do you write code that can handle any runtime condition?

Getting it verified by a sound static analysis tool would be one way.

Originally posted by NobodyXu View Post

That's simply not possible, at one time or another it will end up aborting because the whole program is broken beyond repair.

The aviation and nuclear industries have been doing it for years.

Originally posted by NobodyXu View Post

For a static analyzers to verify C code, it will have to go through all code path because C code provides no lifetime annotation (information).
Thus, it has to do the same thing as sanitizer (miri) by emulating through all code and it also has to go through all paths.

That will take really long for anything really large, such as system for avation, not to mention its feasibility.

The modern tools can do it relatively quickly. See astree’s advertising for some numbers.

**NobodyXu** · 10 April 2023, 11:13 PM

Originally posted by ryao View Post

Then it could reach the same state it had previously and crash in a loop. How is that better?

That could happen, but these kind of bug may not be that deterministic.
Restarting the program may or may not trigger the bug.

Originally posted by ryao View Post

.
Here is the best one as far as I can tell. It is heavily used in aviation and nuclear power, and is proprietary, but you can get a 30 day trial:

Astrée Static Analyzer for C and C++

https://www.absint.com/astree/index.htm

Astrée is a static program analyzer that proves the absence of runtime errors and invalid concurrent behavior in safety-critical applications written or generated in C or C++

Here is another proprietary option that competes with it:

Polyspace Code Prover

https://www.mathworks.com/products/polyspace-code-prover.html

Polyspace Code Prover proves the absence of run-time errors in handwritten and generated source code without requiring you to execute the code.

Here is an open source option developed by NASA, but it is the least capable of these tools, since it lacks concurrency support:

GitHub - NASA-SW-VnV/ikos: Static analyzer for C/C++ based on the theory of Abstract Interpretation.

https://github.com/NASA-SW-VnV/ikos

Static analyzer for C/C++ based on the theory of Abstract Interpretation. - NASA-SW-VnV/ikos

I could go on, but these are the most notable ones based on what I currently know.

Thanks, I would have a look at it later.

Originally posted by ryao View Post

A number of non-performance critical applications can. If crashing to avoid a memory issue is acceptable behavior, then they can be run with sanitizers.

Note that "crashing on out-of-bound" is a default behavior on Rust if you use indexing.
It is totally possible to recover from that if you use slice::get https://doc.rust-lang.org/std/primit...tml#method.get or just check for it manually.

What I was arguing is that when the programmer did not explicitly check the bound, it is better to crash than silently corrupt.
This is certainly not ideal, but still a better default.

slice::get https://doc.rust-lang.org/std/primit...tml#method.get provides an ergonomic API that you can just `?` to propagate the out-of-bound condition:

Code:

fn access(index: usize) -> Option<u32> {
    let array = [2, 2, 0, 1];
    let n = array.get(index)?;

    let array2 = [2, 3];
    array.get(n)
}

This would return `None` if the `index` is out-of-bound while still easy to write.
Combined with RAII, it could also handle cleanup without explicit "goto" in the program.

Originally posted by ryao View Post

Doing a rewrite in a new language is a recipe to have even more problems than you already have, no matter what the new language claims to do. There are so many naive mistakes that you make when writing something new that it just is never worth it unless the code base is so bad that the time spent fixing it would be greater than the time spent doing a rewrite. The result of an unnecessary rewrite is worse than we can achieve in C with the right tools, so there really is no reason to do it.

That depends, but let's not get off-topic since our opinion on this already diverts a lot.

**NobodyXu** · 10 April 2023, 11:37 PM

Originally posted by ryao View Post

You change your code whenever a sound static analysis tool complains. When it stops complaining, the tool has proven the code is free of the issues it is designed to detect.

Yes but that is not guaranteed to be memory safe.

Originally posted by ryao View Post

You can prove memory safety for it via abstract interpretation just like you can for any other language, although I am not aware of any tools that do it. In-line assembly is usually out of scope for tools that focus on high level languages.

It's just impossible to do that.

Consider Linux kernel using inline assembly, is it possible to prove these assemblies does not trigger UB?
Maybe for simple ones, but for anything more complex, like involving hardware management, dma, I/O, memory mapping, then it's very hard to verify that.

You have to trust programmers to do their job and review, in the same spirit of using "unsafe" in rust.

E.g. if you mmap the entire file twice as writable, create an immutable reference into the first one, then a mutable reference into the same offset in the second one,
Modifying the second one change the first one, but the compiler would think that it would be valid to optimize the first one out.

The solution to this is to use volatile read/write or even atomics if it could be concurrent or just to be safe, but that is pretty hard for the static analysis to detect and handle every one of these cases.

You really have to trust the programmer at some point, believing they are doing the right thing with their "unsafe" code.

Originally posted by ryao View Post

This really depends on what you mean by safe. I suspect you do not mean just memory safety. If you do mean just memory safety, then run a sound static analysis tool on the kernel.

No, suppose that the kernel has a bug in its inline assembly that configures its shared memory with iGPU and accidentally shares more memory than intended.
How do you detect that at compile-time?
I remember something similar happens when reversing apple M1 in Asahi Linux.

You would have to leave it there and just trust the programmers doing "unsafe" code.

Now, in practice, this will probably fine, since due to the thorough review and testing, we are confident that no such bug exists, but none the less, that's something static analysis cannot catch as it is far beyond its model capability.

Originally posted by ryao View Post

The modern tools can do it relatively quickly. See astree’s advertising for some numbers.

Many static analyzers have high computational costs (typically, several hours of computation per 10,000 lines of code); others terminate out of memory, or may not terminate at all.

In contrast, Astrée is efficient and easily scales up to real-world programs in industrial practice.

As an example, in order to analyze actual flight-control software with 132,000 lines of C code, even on a slow 2.8GHz PC Astrée takes a mere 80 minutes. Faster machines will get you faster results. Multicore parallel or distributed computation is supported.

80m sounds...a lot to me.
Though I'm not sure which CPU they are using for comparison.

It's quite good actually, if it does what it claims to find out all UBs, since they have to cover every reachable path.

Rust compiles quickly on default free Github Action tier https://docs.github.com/en/actions/u...ware-resources

Hardware specification for Windows and Linux virtual machines:

2-core CPU (x86_64)
7 GB of RAM
14 GB of SSD space

Hardware specification for macOS virtual machines:

3-core CPU (x86_64)
14 GB of RAM
14 GB of SSD space

Compiling `cargo-binstall` https://github.com/cargo-bins/cargo-binstall/ , which contains 7.78k code itself and a lot of dependencies, such as a http client, rustls (something like wolfssl), tokio (async runtime), json/toml/yaml (de)serializer, etc, takes roughly 3m with caching for a debug build on x86_64 linux https://github.com/cargo-bins/cargo-...obs/8244160434

For release build, we use `-Oz` and enables fat LTO, plus using only one codegen-units and it only takes 5m with caching https://github.com/cargo-bins/cargo-...67374#step:6:1

It's not a completely fair comparison since we use caching, but even without caching it would take at most 7m for debug build and 15m for release build.

**NobodyXu** · 11 April 2023, 04:11 AM

ryao The static analysis you listed is really awesome, but I think the biggest obstacle in adopting them is probably their price and the time takes to verify the code.

While in principle, everybody writing C/C++ should get one of this, in reality it's just hard to convince management in proprietary development or gather enough fund in open source development.

And even worse, they do not list pricing on their website, you need to contact them to obtain price.
Which means you could get different price based on your usage and industry and it will be even harder to convince people to use it.

Should it be open source like https://github.com/NASA-SW-VnV/ikos it should be much easier to convince them.

Another problem is the time it takes to do the analysis.
<80m for 132,000 lines of C code sounds good for a static analyzer that can eliminate almost all memory bugs, but it doesn't sound like it scale as well as C compiler or even Rust compiler, which is considered to be somewhat slower than C++ compiler due to additional checking and mir optimization and treating one crate as a compilation unit, instead of one module.

For projects that are very large/complex and have frequent PR, this is probably a even bigger problem for them.
I imagine running these static analyzers on GHA wont' went well, since GHA uses old machine, is under pressure from open source projects all over the world, and might terminate the CI once it exceeds a certain threshold (3h?).

If you create too many CIs in there it's likely to get rate limited.

Also, one another thing I notice is that, it's probably very hard to do caching with Astree.
In order to find out all memory bugs, you would need to perform a whole program analysis, which unfortunately doesn't sound like something can be easily cached.

This is probably the reason why Mozzila invented Rust in the first place, because browsers are very complex and Firefox has 21M lines of code in 2020, and it's only going to have more code added to it (e.g. wasm, WebGPU, new html/js feature, http3, more fluent UI).

Running astree on this won't go well due to the exploding amount of code paths.
Even if we assume it is linear and that upgrades to latest CPUs yields 50x speedup, it's still going to take 254m, which is 4h for each run.
It just shows how poorly it scales, which I guess is the reason why Astree isn't that popular.

P.S. when looking for astree vs rust, I accidentally find this comment when was probably left by you on reddit https://www.reddit.com/r/rust/commen...eb2x&context=3

**ryao** · 11 April 2023, 06:47 AM

Originally posted by NobodyXu View Post

Yes but that is not guaranteed to be memory safe.

It is guaranteed to be memory safe. That is the entire point of the sound static analysis tool. Even better is that you can get memory safety without cheating by calling abort() as long as you fix the complaints in a way that does not call abort() or its equivalent.

Originally posted by NobodyXu View Post

80m sounds...a lot to me.
Though I'm not sure which CPU they are using for comparison.

It was on a single core 32-bit Pentium 4. That is extremely fast. Most (all?) static analysis tools support running on multiple cores, and cores are so much faster now, that it would not surprise me if that ran in under a minute on modern hardware.

Also, commercial tools tend to support incremental analysis, which makes things even quicker, although I am not sure if the ones I linked support it.

Originally posted by NobodyXu View Post

Running astree on this won't go well due to the exploding amount of code paths.
Even if we assume it is linear and that upgrades to latest CPUs yields 50x speedup, it's still going to take 254m, which is 4h for each run.
It just shows how poorly it scales, which I guess is the reason why Astree isn't that popular.

ZFS’ test suite can take 6 hours to run and people were fine with that until it started hitting GitHub’s 6 hour runner timeout. Even if it took 4 hours to run, people would be happy with that. You would find that most projects would be happy with analysis that finishes overnight.

That said, as long as it supports multicore, you could potentially just keep increasing the core count to make it faster. If it supports incremental analysis (like PVS studio, an unsound analyzer), you could potentially get results in minutes.

As for Astree not being popular, that would probably have more to do with a mix of nobody talking about it and the pricing model discouraging adoption. We need people to develop open source replacements, but the incorrect popular thinking that it is impossible causes efforts to go into other things like Rust, despite people demonstrating that it is in fact possible.

**NobodyXu** · 11 April 2023, 08:25 AM

Originally posted by ryao View Post

It is guaranteed to be memory safe. That is the entire point of the sound static analysis tool.

No it cannot, no static analysis tools can prove that unless you don't do any syscall and don't use any assembly.

How to you prove syscall to be memory safe?
A simple mmap might break the assumption of exclusive access.

And assembly can also be complex enough that the static analyzer cannot understand, such as the dma, setting up iGPU, disk I/O, etc.

The static analyzer in reality can only prove a subset of the program to be memory safe, same for any static analyzer, regardless which PL (unless it forbids syscalls & assembly).

Originally posted by ryao View Post

Even better is that you can get memory safety without cheating by calling abort() as long as you fix the complaints in a way that does not call abort() or its equivalent.

It's true that "abort()" isn't the best solution, but I won't call that cheating since it is a better default than silent corruption.

There are also tools in Rust to check that no panic exists at link time https://github.com/dtolnay/no-panic/ , which basically requires all assertions to be eliminated at compile time.

It relies on optimization doing its job, but I think it would work perfectly for indexing since it is trivally inlinable.

Originally posted by ryao View Post

It was on a single core 32-bit Pentium 4. That is extremely fast. Most (all?) static analysis tools support running on multiple cores, and cores are so much faster now, that it would not surprise me if that ran in under a minute on modern hardware.Also, commercial tools tend to support incremental analysis, which makes things even quicker, although I am not sure if the ones I linked support it.

If you assumes everything scales linearly and that passmark is a reasonable multi-threaded performance benchmark that AMD EPYC 9654 is 406x better than Pentium 4, then Astree should be able to complete with 23s.

But I don't think that is the case here.
With 21M code instead of 310k code, the amount of code paths possible should growth much more than linearly, given that the complexity of the code has also grow significantly and many code couples with others.

Not to mention that compiling 21M lines of C++ code takes significantly longer than that, so I don't think it can finish within 1m.

I can't find benchmark for compiling firefox but I do find benchmark for llvm: It takes roughly 100s for it to compile on 9654, upgrading to 2P doesn't give it much of an edge.

If https://www.openhub.net/p/llvm is to be trusted, then llvm has 9 million of C++ code, I don't think Astree can be run for Firefox under 100s.

Though I did forget that part of Firefox is written in Rust, so it practice it should be less than 21M code.

But there's something I wonder: Firefox has many runtime dependencies, for them to be proved memory safe, Astree should be also run over them right?
So the exact numbers is unknown, but I don't think it can complete with 1m even on EPIC 9654.

Originally posted by ryao View Post

ZFS’ test suite can take 6 hours to run and people were fine with that until it started hitting GitHub’s 6 hour runner timeout. Even if it took 4 hours to run, people would be happy with that. You would find that most projects would be happy with analysis that finishes overnight.

Not every project is like ZFS where you can wait for that long.
6 hours is really, really too long.

Originally posted by ryao View Post

That said, as long as it supports multicore, you could potentially just keep increasing the core count to make it faster. If it supports incremental analysis (like PVS studio, an unsound analyzer), you could potentially get results in minutes.

Yes, I think using multi-core system like EPIC 9654 for static analyzer is necessary, otherwise it just takes too long.

Originally posted by ryao View Post

As for Astree not being popular, that would probably have more to do with a mix of nobody talking about it and the pricing model discouraging adoption. We need people to develop open source replacements, but the incorrect popular thinking that it is impossible causes efforts to go into other things like Rust, despite people demonstrating that it is in fact possible.

It's certainly possible, but its cost is really high.
Currently, the fastest static analyzer is Astree, something behind patents and is quite expensive.
And even then it requires you to either wait for really long and could potentially fail due to hitting rate limit, or rent/host your own VM/machine.

The cost is just too high that in many situations it is ruled out.

Also, C itself really isn't a good programming language for large project with complex requirement:

Originally posted by nobodyxu

I would literally still use Rust in that case.
If Rust does not exist, then I would just use C++ or cppfront etc.

C lacks generics and instead use macros, a poor man solution.
C macro isn't even sanitized and you could easily do weird and very confusing things with it.

It doesn't even have a easy to use HashMap/BTreeMap in the stdlib, just one cumbersome function in libc.
C++ at least has it, although its complex constructor overloading plus initializer_list stuff makes it a bit harder to use, plus lack of borrow checker in C++ is a really deal breaker since you could easily do something stupid yet not realizing it, like causing a reallocation in HashMap while holding a reference/pointer.

It also doesn't support async, so most network bounds program manually implement it or by having a callback based one.
That is something really hard to use.

I also wish C/C++ would have something like cargo for dependency management instead of using apt/homebrew or whatever.

Installing rust toolchain and cross compilation for rust itself is also easy in rust, using rustup would install everything needed.

Cross compiling C/C++ is hard and TBH any rust crate using C/C++ crate would also have to deal with this.

Thankfully we have zig, which provides zig-cc for easy cross compilation and it even supports cross compiling to different glibc version to keep bardwards compatibility.

Not to mention that borrow checker in Rust indeed catches a lot of memory bugs, including thread-related race condition and miri is also very useful.

**Weasel** · 11 April 2023, 08:59 AM

Originally posted by NobodyXu View Post

No it cannot, no static analysis tools can prove that unless you don't do any syscall and don't use any assembly.

You can say the same fucking thing about Rust with unsafe.

Syscalls are just like external APIs you don't have code to. You assume they're correct. If not, they're the ones that need to be fixed, not your code.

It's the same fucking thing in Rust. Are you saying Rust can prove syscalls and external APIs it doesn't have code to is "safe"?

Stop grasping at straws man.

**ryao** · 11 April 2023, 09:00 AM

Originally posted by NobodyXu View Post

No it cannot, no static analysis tools can prove that unless you don't do any syscall and don't use any assembly.

How to you prove syscall to be memory safe?
A simple mmap might break the assumption of exclusive access.

And assembly can also be complex enough that the static analyzer cannot understand, such as the dma, setting up iGPU, disk I/O, etc.

The static analyzer in reality can only prove a subset of the program to be memory safe, same for any static analyzer, regardless which PL (unless it forbids syscalls & assembly).

Designing these tools requires having a very accurate model of how those operations work. It is proven within the model. Every remark you make here can be used against Rust, so I am not sure how this helps your “Rust is the answer” argument. Rust gives much weaker guarantees than Astree does across the board.

Announcement

X.Org Server Hit By New Local Privilege Escalation Vulnerability

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment