sunfishcode's blog
A blog by sunfishcode

What is a Capability?

Posted on

This blog post aims to provide a simple answer to the question: What is a Capability? I answer this question from my own perspective, as someone who didn't previously know anything about component models.

This post is forward-looking; not all of the pieces described here are usable yet. It's a look at what's coming.


When I say "capabilities", I'm talking about capabilities in the context of the Wasm component model. These are capabilities in the sense of capability-based security.

There is a relationship with the term "object capabilities", or "ocap", however that terminology involves some nuance and I'm aiming for a simple intuitive description here.

I'm not talking about wasmCloud "capabilities" or Linux "capabilities". Similarly, there are connections that could be drawn here, but I'm aiming to keep things simple.

Ok, Let's go!

One of the great things about Wasm is that it can't do anything.

An anthropormorphized Wasm component with no arms.

Wasm has no syscall instructions. It has no I/O ports. All it can do is compute. And it can import functions from the outside and call them, and export functions to the outside and have them be called.

So the way to let a Wasm component do something outside of itself is to give it functions it can import that do things, or conversely, to call its exports from functions that do things when they return.

An anthropormorphized Wasm component *with* arms.

And that's great, because it means the Wasm program can do what the things we provided it can do, and nothing else.

Imports and exports are supplied at link time, so we call these link-time capabilities.

Capabilities with too much authority

Link-time capabilities are useful, but often they're too useful. As a simple example, the following interface describes a link-time capability:

/// Execute the command described in the provided
/// `command` string.
do-command: func(command: string) -> string

Imagine this is a function that passes the string argument to some command interpreter, that can execute arbitrary commands. This can potentially be a very powerful API, capable of doing almost anything.

Imagine I have a large codebase, and I want to pass around a reference to some variable within in the command interpreter. The only way to do it with this do-command API is to pass around the string name.

// Set `x` to 2 within the command interpreter.
do_command("x = 2");

// Set the variable named in `some_dynamic_string` to 2.
do_command(format!("{} = 2", some_dynamic_string));

With this, it can be very difficult to answer questions like "what are all the places in the code that use this variable?" or, looking at a particular callsite, "what are all the variables this callsite could access?". Strings can originate from many places at runtime, potentially even untrusted places like attacker-controlled inputs, so it's very difficult to make any kind of comprehensive guarantees with an API like this.

In systems where command strings are computed from other strings, it's also notoriously difficult to prevent quoting and whitespace bugs, leading to bugs like SQL Injection.

And when we look at modular systems, when a string is passed from one component to another, it often means that there's an assumption that the two sides are implicitly sharing a namespace, as if a ghost were connecting the two.

A Ghost!

Furthermore, APIs like this do-command are awkward to virtualize, for sandboxing, logging, testing, or other purposes. They require intermediaries to always parse the command string, even when they're only interested in specific kinds of requests.

Ultimately, APIs like this combine too many authorities into a single capability. Capabilities like this are called coarse-grained, since an individual capability exposes access to many distinct logical resources.

Traditional Filesystem and IP networking APIs tend to be coarse-grained, with functions like open or connect:

open: func(path: string) -> file

connect: func(hostname: string) -> socket

Filesystems and network namespaces both often contain many different logical resources, often with differing security considerations. The code that calls open or connect can open any file or host in the namespace, identified only by some runtime string value. And sandboxing such APIs can be tricky, because resources can often have multiple names, with links or aliases, which can appear anywhere in the namespace.

An anthropormorphized Wasm component with a filesystem, offering to trade a file for a string.

One of the great properties of link-time capabilities is that one can find all the places that use them at link time. But this advantage is negated if the capabilities they represent are coarse-grained and you can't tell which specific resources are being accessed until runtime anyway.

Runtime capabilities

To avoid these problems, we need runtime capabilities. The Wasm component model provides this with handles.

An anthropormorphized Wasm component with a filesystem, offering to trade a file for a magic token.

Handles are references to resources, and are values which can be passed around as arguments and return values between components. And unlike other values, handles are unforgeable. The only way to construct a handle to a resource is to be the implementor of the resource. The only components that can obtain a handle are components it's been explicitly passed to.

Handles are similar to file descriptors in Unix. They're references to external things, where the access to the thing is represented by the reference itself. And like file descriptors, they may be represented in source languages as i32 values or similar, where those i32 values are indices into a table of the actual unforgeable handle values.

Revisiting the filesystem and networking APIs above, we can make them more fine-grained by adding handles arguments, such as a directory in which to open files, or an address pool with which to initiate a network connection:

open-at: func(dir: directory, path: string) -> file

connect-in: func(pool: address-pool, hostname: string) -> socket

This way, instead of having an API which can reference any resource in an implicitly shared namespace, conveyed as if by ghost, we have an API where the namespace to use is explicitly communicated as an explicit argument. This gives the caller, or an intermediary, the opportunity to be selective about what resources are passed to the callee, without needing to configure an external sandbox.

Wrap up

To do anything, Wasm components need capabilities. These can be provided at link time, via imports and exports, which is sometimes useful, but can easily be too coarse-grained. The Wasm component model also includes handles, which identify capabilities at runtime.

Interfaces that expose access to logically distinct resources should represent them as distinct capabilities.