Context Brainstorming

Posted on December 23, 2021

This is a blog post brainstorming about contexts.

I'll us the term contexts here, as tmandry is leaning to, since it seems to make sense to keep capabilities distinct concepts. Idiomatic capability-based code and the Principle of Least Authority prefer fine-grained access to resources, which contexts don't seem like a good fit for. So let's keep these concepts distinct for now.

yoshuawuyts showed me there is way we might use something like contexts to retrofit an awareness of ambient authority into Rust. Here's an attempt to sketch up more of what that might look like.

Automatic contexts

Let's extend the contexts proposal with a concept of automatic contexts, that functions would implement by default. Just like how Rust has automatic trait impls. Like automatic trait impls, you can opt out, with negative with-declarations, using ! syntax.

And let's introduce the concept of supercontexts, which are contexts that imply other contexts. Much like supertraits in Rust. This isn't strictly necessary, but it helps with granularity.

With those, and the observation that contexts are a way of coloring functions, let's introduce some hypothetical automatic contexts:

global_allocator, the ability to use the Rust global allocator.
ambient_authority. Similar to this AmbientAuthority, but as a context, so it can be more. This would be a supercontext which includes:
- fs - the current process' filesystem namespace
- net - the current process' network namespace
- time - the current process' time namespace. Preventing code from observing time entirely is hard, especially if there can be multiple threads, so maybe this time would just be about the explicit time APIs rather than blocking all potential time sources.
- stdio - access to the ambient stdin, stdout, and stderr
- process - the ability to spawn arbitrary child processes
- mutable_static - write to or read from statically-allocated mutable and interior-mutable state in the process. There are use cases where statically-allocated state is useful, but since we have contexts here, for maximal modularity, these cases should ideally use contexts instead of implicitly associating state with the whole process.
- and others. OS's attach a lot of miscellaneous authorities to processes. Ideally we'd make sure we have everything covered.

These being automatic is kind of a way to retroactively reinterpret existing Rust code. All code now defaults to having these contexts, and we can then opt out of them, like this:

fn useless()
with
    !ambient_authority 
{
}

Here, we can know immediately that this useless is a useless function just by looking at its signature. It has no return value, no arguments, no ambient authority. All it could do is return, panic, or infloop.

Panic could unwind, and it'd be nice to add a context for that too:

unwind - the ability to unwind the stack

then !unwind could be used for functions that can't unwind. Maybe this could even be connected to LLVM's nounwind. Anyway, with !unwind, we could write code like this:

fn totally_pure(a: &A) -> B
with
    !ambient_authority +
    !unwind
{
    // lots of interesting stuff
}

I think someone told me once that the Rust compiler can know whether types have interior mutability. Let's assume it can, and that this includes types that hold I/O handles. In theory, if A here has no interior mutability, this should allow Rust to annotate functions like this with optimizer attributes like LLVM's readonly, meaning calls to it could be redundant-code-eliminated.

Beyond just LLVM though, this could enable MIR-level redundant-code elimination of calls, even pre-monomorphization. No need to do complex alias analysis or escape analysis, because the type system just tells you what you need to know up front!

But it wouldn't get dead-code elimination, because of the possibility of inflooping. More on that later.

Pure, except where indicated otherwise

By the way, if one of the arguments has a type that does have an I/O handle, including a filesystem handle, then the function can always do I/O. The fs context is about the process' filesystem namespace. So with !fs, you can't do File::open, but you can use a Dir you've been given as an argument to do Dir::open, because it's resolved relative to a directory you have an explicit handle to, rather than the process' filesystem namespace.

Similarly, passing a &mut reference into a function marked this way requires no special ceremony. Unlike "pure" keywords in languages where purity is all or nothing, the rule here is, if the signature has a &mut, the callee can access it as a &mut, including mutating it:

fn pure_except_as_obvious(a: &A, m: &mut M, f: &File) -> B
with
    !ambient_authority +
    !unwind
{
    // lots of interesting stuff, including mutating `*m` and writing to `*f`.
}

See First-class I/O for more discussion of this.

Security

It may be surprising that that I haven't talked about security in this post yet. It turns out that capability-based security really is just a special case of a deeper capability-based design philosophy. It's similar to how Rust's borrow checker is, on its face, a memory-management strategy, but also much deeper, with things to say about such seemingly unrelated areas as thread safety, pointer aliasing, iterator invalidation, and refactoring. There's a lot going on here.

It also turns out that security for untrusted or compromised-supply-chain code is complex. For example, if we want to completely sandbox a piece of Rust code with language mechanisms, we need to make sure it can't use unsafe blocks, since unsafe Rust could trivially escape any sandbox. Security exploits are ok relying on UB if it works with enough probability in practice.

Getting closer: unsafe

This post is all about contexts though, so let's see if we can use them to fix that problem too:

new_unsafe - an automatic context representing the ability to introduce new unsafe contexts. This corresponds to the ability to write unsafe { ... }.
unsafe - a retroactive reinterpretation of what unsafe fn desugars to. Includes new_unsafe as a subcontext, or doesn't, depending on how unsafe blocks in unsafe functions goes.

As an aside, contexts would also be a path for libraries to define unsafe-like concepts for their own invariants, which is something I occasionally see people asking for in Rust.

With new_unsafe, we could write:

fn untrusted_code(x: &X, y: &mut Y) -> Z
with
    !ambient_authority +
    !new_unsafe
{
    // untrusted code here?
}

Would this be a secure sandbox? Not yet; one problem is that even if we know X has no interior mutability or I/O handles, this code still exposes the address of x or y to untrusted code, because converting a reference to a raw pointer doesn't require unsafe in Rust. The address might tell an attacker something about the ASLR in the process, which might make other attacks more powerful.

Still getting closer: raw pointers

When all you're doing is writing a blog post about contexts, everything looks like a problem to be solved by adding a new automatic context.

raw_pointers - the ability to convert references into raw pointers.

In addition to solving this ALSR problem, this attribute has some interesting possibilities. It's awkward how Rust allows APIs with reference arguments to observe whether two references have the same address, when this usually isn't part of the conceptual API. !raw_pointers would be a way to declare that a function doesn't do that.

Further, with !raw_pointers, it'd be possible to have Rust code that doesn't depend on a byte-addressed address space. There'd be no alignment or endianness visible. Objects could be moved at any time, just like in a moving GC. Threads could be migrated to different stacks. This might even open up a path to Rust being able to use Wasm reference types, which Rust can't otherwise hold directly since they're opaque and can't have their representation exposed.

Are we secure yet?

No. But, to keep this blog post scoped, let's ignore side-channel attacks like Spectre, hardware attacks like Rowhammer, crypto miners, and denial-of-service attacks. And let's ignore attacks which change the behavior of the code without breaking the sandbox, such as changing an encryption implementation to emit syntactically valid but insecure data. That's a lot to ignore in reality, but the solutions to those would require radically different mechanisms, so let's put those aside for now.

Ok, now are we done yet?

What about global variables? We included mutate_static in ambient_authority above, so they won't be mutated, but is it a problem if the untrusted code reads any of the program's global immutable state? Could it find authentication secrets? To answer this, we'd need to start getting more specific about the threat model. But to keep things simple, let's say the program doesn't have anything sensitive in immutable global state. It's best to keep sensitive things like authorization credentials as scoped as possible in general anyway.

Along those lines, what about std::env::vars, std::env::args, std::env::home_dir and others? They might contain sensitive information, or even just your username. Let's say these are disallowed by !mutable_static by virtue of being mutable through libc APIs. Or, if needed, we could also add a new context to cover these.

Will it ever stop

This is just a brainstorming post, and it's possible things are missing, but it's likely any such things can be covered by adding more contexts. For the sake of making a finite blog post, let's assume we can cover everything.

So can we say then, that we now, assuming all of our assumptions, finally have a secure hypothetical sandbox here?

fn untrusted_code(x: &X) -> Y
with
    !ambient_authority +
    !new_unsafe +
    !raw_pointers
{
    // untrusted code here!
}

Yes.

beat

In theory.

In practice, the Rust compiler isn't currently designed or intended to be used as a security surface in this way. And it's not necessarily worth it for it to try to be one. There'd be work involved, and for this to actually make sense, we'd need to look at real-world use cases and attack vectors, and we wouldn't be able to ignore any of the things we ignored above.

Capability-based programming

However, even if we don't look to !ambient_authority to be the basis of an actual sandbox, and even if the performance impacts of the aliasing, escaping, and side effect knowledge isn't compelling, this overall technique might still be useful.

For people reviewing code, !ambient_authority could reduce the reasoning footprint, because they'd be able to make more local assumptions about the side effects of calling functions.

And for people building large complex applications, it could give them more tools to help ensure that two parts of the application don't have unintended interactions, as explored here.

And for people building wasm components, it could give them more tools to ensure that they're only using APIs which compose cleanly with other components.

Potential downsides

With all these colors, and with users having the ability to define their own colors, we could end up with a lot of colors.

Will having an ecosystem where everyone can use all these colors to enforce their requirements with extraordinary precision increase or decrease overall usability of Rust? Will it lead programmers to waste time pursuing every possible dimension of theoretical purity, regardless of what really matters in practice?

Will these new colors and automatic contexts prompt new rounds of users going through all their dependencies and insisting that they support new colors? If so, will it cause ecosystem churn and/or awkward workarounds, or even ecosystem fragmentation, like #![no_std] sometimes does, and is that worth it?

I don't know.

What I do know is, in a vacuum, it sure is fun to think up new colors.

Tangent: Pretty colors

Let's think about one more possible context, for fun:

turing_complete - the ability to have loops (or tail recursion, if Rust adds that), that can't be proved to terminate.

The halting problem gets talked about a lot. However, how often does one actually write loop, as opposed to just using for? If we also had a way to assert that iterator implementations don't repeat themselves, a lot of real-world code might be able to be compatible with !turing_complete.

One of the tricky issues for iterators would be linked lists, which would need to be guaranteed to be acyclic. But it's interesting to note that in Rust, creating a circularly linked list actually requires unsafe anyway. So maybe there's something we could do here.

This would also address the "or infloop" case mentioned above, so we could also get dead-code-elimination of calls.