sunfishcode's blog
A blog by sunfishcode


The Spectrum from Namespaces to Values

Posted on

In order to make large applications modular, we need to think about resources and sharing. One axis for thinking about this is the spectrum of granularity. It's a spectrum, but we can identify several notable levels:

This post describes each of these levels and considers the impact on application modularity of moving up through each of the levels.

As a caveat, in this post, I'll be discussing trusted code. It's about modularity and flexibility. Specifically, I'm ignoring the concerns of untrusted code here. If there's code in the system which is actively attempting to break out, it requires additional mechanisms to securely contain it. The concepts discussed here can be used as part of a sandbox for untrusted code, but a proper sandbox involves many additional concerns that I'm not covering here.

Usually when someone writes a post about capabilities and the Principle of Least Authority, they're talking about security. However, this post is about how these concepts are relevant to programming in the large in general.

With that in mind, let's start at the bottom and work our way up :-).

Level 0: File::open

Level 0 is how most application code today uses filesystems:

fn foo() {
    let thing = File::open("/path/to/some/dir/thing");
}

There's a filesystem namespace implicitly associated with the program, and filesystem paths are resolved in it. Lots of code works this way to day, and it works. But, traditional OS isolation mechanisms are not sufficient for many popular use cases today, so many use cases end up using containers. A container provides a little world for an application to run in, where it can pretend it's running on a "normal computer" built specifically for it.

However, containers are coarse-grained. One typically needs to run an entire application within the same container. Or if applications are split into parts that run in different containers, the parts usually need to communicate over invasive RPC mechanisms. By climbing up to the next level of granularity, we can gain more options.

Level 1: with root: &Dir

Using a mechanism such as cap-std's Dir type, we could arrange to pass in a "root filesystem" as a parameter rather than using a namespace implicitly attached to the program:

fn foo(root: &Dir) {
    let thing = dir.open("/path/to/some/dir/thing");
}

And if we write a lot of code like that, we might find a recent blog post about context capabilities useful. That would allow us to write the above code like this:

fn foo()
with
    root: &Dir
{
    let thing = File::open("/path/to/some/dir/thing");
}

Unlike the standard-library File::open, this would use a File::open that also uses context capabilities, so it would also have a with root: &Dir, and it'd automatically get passed the root in which to resolve the path.

This approach has the appealing property that the body of the code is the exact same as it is in level 0. It would be easy to convert existing code to use this style, because it just requires changes at the high-level scopes.

Going from 0 to 1 means that an application could have the ability to create little dedicated filesystem worlds, and run parts of itself in these little worlds, with each part having its own root filesystem. It wouldn't have to completely split into multiple containers.

That's a step up. However, creating dedicated filesystems is still complex and potentially inefficient. How do we know what those little worlds need to contain? It requires us to know the set of strings that the code might dynamically pass to an open function. Sometimes we can get a pretty good idea of code code needs by studying it. And sometimes we resort to running code and just dynamically recording what things it needs.

And, if two parts of an application want to communicate, we still have to figure out if they need to run in the same little world, or if they can run in different worlds. And if they are in the same world, to share data through it, we then need to somehow make sure they don't collide in unintended ways in other parts of their shared filesystem, such as needing different versions of a dependency installed at the same path.

Level 2: dir.open

The next step up is to break up the monolithic "world" into more fine-grained handles. cap-std's Dir type makes it easy to have first-class directory handles, so we can just pass those around.

fn foo(dir: &Dir) {
    let thing = dir.open("thing");
}

We could use the above-mentioned context capabilities for these. However that syntax is less valuable once we start getting more fine-grained, as we do here at level 2. When we're talking about a whole filesystem, it's common for code to want to treat it like an ambiently present resource, and context capabilities approximate that. But once we can talk about specific directories, it's more natural for code to be aware of the specific directories it needs, and take those directories as explicit parameters. It's no longer about "here's a world to operate in", it's about "here's the directory you asked for".

Passing individual directories is a step up from level 1. With this, when an application is split into parts, we don't need to create whole filesystem views for each part. We just need to give each part its own dir directory. And if they do need to share, they just share those directories, and we don't have to worry about them colliding in other areas of their filesystems.

But if the only thing foo needs is one file in that directory, passing it a whole directory is still more complex than we need. How can we know when two parts of an application need to share a directory? As above, the only way to know how they interact is to look at the set of strings they dynamically pass into open functions.

Level 3: thing: &File

If foo just needs a single file, it can be passed just that file:

fn foo(thing: &File) {
    // ...
}

This way, an application can be split into parts, and the parts can freely operate on different files. The application doesn't need to worry about whether it needs to separate those files into separate directories. Even if the files all live in the same directory, the parts of an application accessing individual files won't collide, because they only have their &File, not the whole directory.

Level 4: thing: &StreamReader

The next level up is to observe that most applications don't need an actual File as such. Even though Unix told us that "everything is a file", pipes and sockets aren't really files, in that they don't always have filenames, and don't support all of the File operations. However, in a lot of use cases, applications would ultimately be ok reading from a pipe or a socket just as well as an actual file. In some cases, we may want them to simply read data from a buffer in memory. The thing that most application code needs is just a source of data to read from.

This could be impl Read, Box<Read>, or the io-stream crate's StreamReader. Or the async versions of any of those. But that's getting into the mechanics of how the I/O is done. It's an interesting topic, but not the focus of this post.

The important part for this blog post is that level 4 is where we reduce a piece of code's requirements down to just an input stream, so it can be connected to other pieces of code without having to think about what other resources that might cause them to implicitly share. Input streams can be streamed across networks. They can be buffered, or stored and replayed. If we can write code that works in terms of streams instead of files, we have a lot of flexibility, because we don't need to worry about the code implicitly depending on file-specific semantics.

And beyond

The next step up from byte streams would be typed streams, iterators, or generators. A big sequence of u8s requires the producers and consumers of the data to have some implicit agreement about how the bytes are interpreted. An API providing a sequence of values of some type T would allow producers and consumers to be self-describing. The mechanics of these APIs are another interesting topic, and also not the focus of this post :-).

The focus is that once we have an abstraction level where bodies of code can communicate typed values, these bodies of code can communicate precisely what they need, without incompatibilities among things they implicitly share.

At each step up, we got closer to "what does the code actually need?", and found ways to give it what it needed, while reducing the amount it implicitly shares with other parts. This reduced the potential for conflicts with other code, and simplified the task of breaking up applications into parts.

An analogous concept in computing is false sharing, except here, instead of performance and sharing parts of cache lines that we don't need, it's about modularity and sharing parts of monolithic resources that we don't need.

By applying the Principle of Least Authority, and passing around fine-grained resources to code that advertises specific needs, we improve modularity.