Broken Encapsulation

Posted on September 02, 2021

What kinds of bugs should safety in Rust protect against?

Rust clearly wants to say that safety is about protecting programs against some kinds of bugs, but not all possible bugs. Where should the boundary be?

Safety should at the very least mean protection against memory corruption through dangling and out-of-bounds pointers. To do that, it's necessary to protect against all Undefined Behavior, because if behavior of a program is undefined, anything could happen, including arbitrary memory corruption.

There are also categories of bugs that we don't expect Rust's safety to protect against. For example, Eq implementations must be reflexive, symmetric, and transitive, however it would likely be impractical to enforce all the invariants of all such APIs. Fortunately, violating these invariants doesn't cause Undefined Behavior; it just causes some algorithms to do the wrong thing.

So, Undefined Behavior is a very practical place to put the boundary.

However, there are several potential situations which do not necessarily involve Undefined Behavior, but which are still arguably within the spirit of Rust's safety:

I/O safety: Raw file descriptors have the same fundamental properties that make raw pointers unsafe: They can dangle and they can be be forged. One crate operating on a dangling or forged file descriptor can end up doing I/O on file descriptors held in other unrelated crates. Without I/O safety, it's impossible to characterize the I/O of a crate without considering the behavior of all other crates it might be linked with.

(I/O safety does intersect with memory safety through mmap, but using mmap safely is non-trivial in any case, and this is not the only motivation for I/O safety.)
POSIX's munlock function: This function takes a raw pointer and length, but it's defined to fail gracefully if given invalid pointers. It doesn't mutate any memory, or cause any subsequent memory access to behave differently with respect to Rust language semantics, so it arguably never causes Undefined Behavior. However, if a crate is internally using locked memory to protect sensitive data, exposing munlock as a safe function would mean that a wayward munlock call in another crate could bypass the first crate's encapsulation and munlock the memory, compromising the sensitive data.
POSIX's write function: this function also takes a raw pointer and a length, and also guarantees to not segfault or mutate any memory. POSIX isn't clear on whether write has Undefined Behavior in the presence of data races or provenance violations or other infelicities with its buffer, but for the sake of this post, let's assume it doesn't. The memory is read by the OS, which one could argue isn't bound by the same rules as userspace. In that case, one can argue that write never has Undefined Behavior. But making it safe would mean safe code in any crate could read encapsulated memory in any other crate, which seems outside the spirit.

So, instead of Undefined Behavior, a slightly more expanded concept that covers these cases might be described as Broken Encapsulation. This is a superset of Undefined Behavior, because anything that causes Undefined Behavior can break any language-level encapsulation boundary. And like Undefined Behavior, it still excludes things like Eq's invariants.

Language-level encapsulation boundaries help in maintaining Reasoning Footprints, especially in programs that contain many crates. It's what lets us look at an individual crate and understand its behavior in isolation, without having to think about whether any other crate in the program could accidentally observe the crate's internal data, do I/O on its internal file descriptors, or cause its internal secrets to be swapped out of memory and potentially compromised.

In practice, thinking about Broken Encapsulation is only slightly different from thinking about Undefined Behavior, but it also reflects a broader observation: Guarding against Undefined Behavior is about ensuring that certain kinds of bugs don't happen, while guarding against Broken Encapsulation is also about helping users build large programs out of smaller parts.