What is a Capability?
Posted on
This blog post aims to provide a simple answer to the question: What is a Capability? I answer this question from my own perspective, as someone who didn't previously know anything about component models.
This post is forward-looking; not all of the pieces described here are usable yet. It's a look at what's coming.
Preliminaries
When I say "capabilities", I'm talking about capabilities in the context of the Wasm component model. These are capabilities in the sense of capability-based security.
There is a relationship with the term "object capabilities", or "ocap", however that terminology involves some nuance and I'm aiming for a simple intuitive description here.
I'm not talking about wasmCloud "capabilities" or Linux "capabilities". Similarly, there are connections that could be drawn here, but I'm aiming to keep things simple.
Ok, Let's go!
One of the great things about Wasm is that it can't do anything.
Wasm has no syscall instructions. It has no I/O ports. All it can do is compute. And it can import functions from the outside and call them, and export functions to the outside and have them be called.
Link-time capabilities
So the way to let a Wasm component do something outside of itself is to give it functions it can import that do things, or conversely, to call its exports from functions that do things when they return.
And that's great, because it means the Wasm program can do what the things we provided it can do, and nothing else.
Imports and exports are supplied at link time, so we call these link-time capabilities.
Capabilities with too much authority
Link-time capabilities are useful, but often they're too useful. As a simple example, the following interface describes a link-time capability:
/// Execute the command described in the provided
/// `command` string.
do-command: func(command: string) -> string
Imagine this is a function that passes the string argument to some command interpreter, that can execute arbitrary commands. This can potentially be a very powerful API, capable of doing almost anything.
Imagine I have a large codebase, and I want to pass around a reference
to some variable within in the command interpreter. The only way to do it
with this do-command
API is to pass around the string name.
// Set `x` to 2 within the command interpreter.
do_command("x = 2");
// Set the variable named in `some_dynamic_string` to 2.
do_command(format!("{} = 2", some_dynamic_string));
With this, it can be very difficult to answer questions like "what are all the places in the code that use this variable?" or, looking at a particular callsite, "what are all the variables this callsite could access?". Strings can originate from many places at runtime, potentially even untrusted places like attacker-controlled inputs, so it's very difficult to make any kind of comprehensive guarantees with an API like this.
In systems where command strings are computed from other strings, it's also notoriously difficult to prevent quoting and whitespace bugs, leading to bugs like SQL Injection.
And when we look at modular systems, when a string is passed from one component to another, it often means that there's an assumption that the two sides are implicitly sharing a namespace, as if a ghost were connecting the two.
Furthermore, APIs like this do-command
are awkward to virtualize, for sandboxing,
logging, testing, or other purposes. They require intermediaries to always
parse the command string, even when they're only interested in specific kinds
of requests.
Ultimately, APIs like this combine too many authorities into a single capability. Capabilities like this are called coarse-grained, since an individual capability exposes access to many distinct logical resources.
Traditional Filesystem and IP networking APIs tend to be coarse-grained, with
functions like open
or connect
:
open: func(path: string) -> file
connect: func(hostname: string) -> socket
Filesystems and network namespaces both often contain many different logical
resources, often with differing security considerations. The code that calls
open
or connect
can open any file or host in the namespace, identified only
by some runtime string value. And sandboxing such APIs can be tricky,
because resources can often have multiple names, with links or aliases, which
can appear anywhere in the namespace.
One of the great properties of link-time capabilities is that one can find all the places that use them at link time. But this advantage is negated if the capabilities they represent are coarse-grained and you can't tell which specific resources are being accessed until runtime anyway.
Runtime capabilities
To avoid these problems, we need runtime capabilities. The Wasm component model provides this with handles.
Handles are references to resources, and are values which can be passed around as arguments and return values between components. And unlike other values, handles are unforgeable. The only way to construct a handle to a resource is to be the implementor of the resource. The only components that can obtain a handle are components it's been explicitly passed to.
Handles are similar to file descriptors in Unix. They're references to
external things, where the access to the thing is represented by the
reference itself. And like file descriptors, they may be represented in
source languages as i32
values or similar, where those i32
values are
indices into a table of the actual unforgeable handle values.
Revisiting the filesystem and networking APIs above, we can make them more fine-grained by adding handles arguments, such as a directory in which to open files, or an address pool with which to initiate a network connection:
open-at: func(dir: directory, path: string) -> file
connect-in: func(pool: address-pool, hostname: string) -> socket
This way, instead of having an API which can reference any resource in an implicitly shared namespace, conveyed as if by ghost, we have an API where the namespace to use is explicitly communicated as an explicit argument. This gives the caller, or an intermediary, the opportunity to be selective about what resources are passed to the callee, without needing to configure an external sandbox.
Wrap up
To do anything, Wasm components need capabilities. These can be provided at link time, via imports and exports, which is sometimes useful, but can easily be too coarse-grained. The Wasm component model also includes handles, which identify capabilities at runtime.
Interfaces that expose access to logically distinct resources should represent them as distinct capabilities.