Introducing cap-std, a capability-based version of the Rust standard library
Posted on
Introducing cap-std
cap-std
is a project to create capability-based versions of Rust standard
library and related APIs.
Capability-based here means that the APIs don't access files, directories, network addresses, clocks, or other external resources implicitly, but instead operate on handles that are explicitly passed in. This helps programs that work with potentially malicious content avoid accidentally accessing resources other than they intend, and does so without the need of a traditional process-wide sandbox, so it can be easily embedded in larger applications.
Background
Some of the most devious software bugs are those where the code looks like it does one thing, and usually does that thing in practice, but sometimes, under special circumstances, does something else. Here's a simple example using Rust's filesystem APIs:
fn hello(name: &Path) -> Result<()> {
let tmp = tempdir()?;
fs::write(tmp.path().join(name), "hello world")?;
}
The expected behavior of this function is to write "hello world" to a file within
a temporary directory. The code looks like it will do this. And indeed, it will
usually do this. But if the path passed in is ../../home/me/.ssh/id_dsa.pub
,
then the behavior of this function could be to corrupt the user's ssh public key 😲.
That's... not remotely within what we said the expected behavior is. It usually
doesn't do that, but under the right circumstances, it could.
And since name
is just a string, if the string is computed in a way that could
be influenced by an attacker, the right circumstances could easily be made to
occur in practice.
The cap-std
project provides Rust crates with lightweight ways to avoid such
problems. In particular, the cap-std
crate's Dir
type represents a
directory, with methods corresponding to Rust's std::fs
functions, for
opening and working with files within the directory, that ensure that all paths
stay within that directory. For networking, the Pool
type represents a set of
network addresses, ensuring that all network accesses made through the API
are to addresses in the pool.
In contrast to conventional sandboxing, cap-std
doesn't have any global state,
so using it in one part of an application doesn't require using it in the rest of
the application. Library crates can use cap-std
internally without imposing any
sandboxing constraints on their users.
What can Dir
do?
cap_std
is just a library, so by itself, it isn't a sandbox for arbitrary
Rust code—it can't prevent arbitrary Rust code from using std::fs
's
path-oriented APIs. Instead, it protects against malicious content, when
filesystem paths can be influenced by untrusted inputs, and malicious
concurrent modifications, when another program running at the same time
has the ability to remove, rename, or create files, directories, symlinks,
or hard links in ways that could cause a program to inadvertently access
unintended resources.
Revisiting our example above, with cap-std
we might write:
fn hello(name: &Path, tmp: &Dir) -> Result<()> {
tmp.write(name, "hello world")?;
}
In this code, if the passed-in path uses ..
to access directories outside of
the one passed in, tmp.write
returns an error.
One key difference from before is that instead of creating the temporary
directory itself, this function requests a directory be passed in. The Dir
type here serves as a "vocabulary" type, allowing the function to declare that
it wants a directory to be passed in, and that it intends to access resources
within that directory, rather than accessing arbitrary locations in the
filesystem. These kinds of declarations can help reduce the
reasoning footprint of a function call.
The Dir
crate also makes it much easier to write this code robustly. There's
no need to think about ..
or absolute paths at the application level, and no
need to handle symlinks specially, which with the Rust standard library today
isn't even possible to do robustly without platform-specific code.
It also gives callers increased control. The caller gets to choose how and
where to create the directory, and when to remove it. Callers could choose to use
something like cap-tempfile
's tempdir
function to easily create a
temporary directory in a conventional location and automatically remove it
afterwords, however they could also opt to create the directory somewhere
else and manage it manually.
Note that Dir
is passed by immutable reference, even though it's being used
to mutate external filesystem state. This follows Rust's conventions, for
example in std::fs::File::set_len
, and it reflects an underlying truth
about filesystems. &mut
in Rust is sometimes called an "exclusive" reference,
because when someone has a &mut
, they're the only one which can access the
underlying object. However, this is generally not a safe assumption when
working with filesystem objects, because other programs could concurrently
access or even mutate files or directories without Rust's type system having
any say in the matter. Consequently, it makes sense to think of filesystem
state as being external to the program, with File
and Dir
objects being
just handles that are themselves typically immutable.
Dir
can also be combined with other security techniques. In a project which is
written to carefully avoid using untrusted paths, it can add an extra layer of
defense in depth.
And in the Wasmtime project, the next step we described earlier is now finished, and we're now using cap-std in combination with with our WebAssembly sandbox to implement WASI, providing sandboxed access to system resources.
A simple example
The main pattern for filesystem operations using the cap-std
crate is to
obtain a Dir
and use methods on it, which closely resemble the functions in
std::fs
.
One of the ways to obtain a Dir
is to use the cap-directories
crate to request a Dir
for a standard directory (similar to the
directories-next
crate, but returns a Dir
instead of a Path
).
For example, to obtain the data directory for an example program:
let project_dirs =
cap_directories::ProjectDirs::from(
"com.example",
"Example Organization",
"`cap-std` Key-Value CLI Example",
cap_directories::ambient_authority()
)
};
let data_dir = project_dirs.data_dir().unwrap();
Then in place of fs::read
and fs::write
to read and write files, one
can use data_dir
here to do data_dir.read(key)
and
data_dir.write(file_name, value)
.
Note the use of the ambient_authority()
function here, which is a no-op that
returns an instance of the opaque AmbientAuthority
type, and serves to mark a
place in the code where ambient authority is being invoked. cap-directories
,
and related crates have an overall invariant that functions don't create their
own absolute filesystem paths, and always rely on resources being passed in as
handles. Functions which don't uphold this invariant, such as
cap_directories::ProjectDirs::from
, take an AmbientAuthority
argument to
advertise their ability to open resources given only a string.
This makes it easy to search a codebase to find all the places where a non-sandboxed cap-std API is being used. It can also be scanned for with Clippy using a clippy configuration file.
To see all this put together in a complete example, see
the kv-cli example
in the cap-std
repository. This program implements a simple key-value store,
using filesystem paths as keys, and using cap-std
ensures that it only
accesses paths within its own data directory. Attempts to escape the directory
with ..
fail gracefully. This is true even if a concurrently running program
renames directories on the path or changes symlinks—something that's very hard
to get right using std::fs
APIs.
$ cargo run --quiet --example kv-cli color green
$ cargo run --quiet --example kv-cli color
green
$ cargo run --quiet --example kv-cli temperature cold
$ cargo run --quiet --example kv-cli temperature
cold
$ cargo run --quiet --example kv-cli color
green
$ cargo run --quiet --example kv-cli /etc/passwd
Error: a path led outside of the filesystem
$ cargo run --quiet --example kv-cli ../../../secret_cookie_recipe.txt
Error: a path led outside of the filesystem
Another useful crate is cap-tempfile
, which creates temporary directories
and provides a Dir
to access them.
It's also possible to create a Dir
by opening a raw path, using
Dir::open_ambient_dir
. Note that this function takes an AmbientAuthority
since it does not uphold the sandboxing invariant that the rest of the API
does.
A real-world example
Web servers often need to serve files from a given directory, and it'd be nice to have a guarantee that they don't accidentally stray outside that directory.
tide-native-static-files is a fork of a real-world Web server project built on the Tide framework, ported to use cap-std instead of directory paths.
The port is very straightforward, mostly consisting of passing around a Dir
instead of a string holding a base directory name. And in many cases, working
with a Dir
is actually simpler than working with a string. The complete set
of changes needed for this port can be seen here.
Implementation Landscape
One of the reasons that Rust doesn't already have a Dir
type, when it does have a
File
type, is that popular OS filesystem APIs don't make this as efficient
or idiomatic as just using paths to name directories. However, this is changing.
One of the inspirations for cap-std
is the CloudABI project, which among
other things developed a technique of using a sequence of openat
calls to
emulate path lookup in userspace in a way that's robust in the face of concurrent
renames. cap-std
uses a variant of this technique, optimized to use fewer
intermediate system calls, to implement a portable sandboxed path lookup algorithm.
And, Linux recently added a system call, openat2
, which has the ability to
restrict path lookup so that it stays within a given directory, which is exactly
the behavior we want here. It doesn't require a process-wide mode, and it avoids
the overhead of doing multiple system calls. cap-std
uses this in place of its
portable algorithm whenever it can. On systems which support openat2
, most
functions in the API perform only one or two system calls.
Linux and other operating systems are also exploring adding more such features,
and as these features become available, it will become increasingly practical to
not just implement a Dir
type, but to implement it with WASI-style sandboxing
protections built in.
Philosophy
cap-std
came about because we were looking to generalize the filesystem
sandboxing techniques we were using in Wasmtime's WASI implementation to make
them more broadly applicable, and we were particularly inspired by
async-std
's philosophy:
the best API is the one you already know.
Rust already has a standard library API. It's very good overall, and a lot of care has gone into ensuring that it's implementable on many platforms. It's used by a lot of code, and well known to a lot of developers.
cap-std
is an approach that takes advantage of this. Developers who know std
can easily learn cap-std
. Applications using std
can be ported to cap-std
, with
the main concern being about how to ensure that directory handles are available
to all the places that need them, rather than with dealing with differences
in the API or in filesystem behavior.
The close alignment between cap-std
and std
, combined with the close
alignment between async-std
and std
, also make it straightforward to do
both at the same time, producing cap-async-std
.
If you're familiar with using std::fs
, you should be familiar with cap-std
's
APIs without any surprises. Similarly, if you're familiar with async-std
,
cap-async-std
's APIs should work as expected.
Current status
Cap-std works on Linux, macOS, FreeBSD, Windows, and more, with stable Rust. On
Linux, cap_std::fs
is optimized to use new system calls including openat2
,
when available, which significantly reduces the sandboxing overhead.
Support for compiling to WASI is under active development.
Speaking of WASI...
The sandboxing performed by cap-std
is the same as what's provided by WASI APIs.
While cap-std
is designed so that it can be used as a library within otherwise
unsandboxed native applications, WASI applies the same kind of sandboxing to all
filesystem accesses, so that it serves as an extension to the core WebAssembly
sandbox.
This means that when the cap-std
library is compiled for the WASI platform, it
will be able to bypass its own sandboxing techniques and simply call into the WASI
system calls directly, achieving smaller code size and tighter integration with
the underlying WASI platform.
The Future!
We're continuing to add more testing, fuzzing, and optimization. A port to WASI is underway.
We're also starting to think about extending the capability-based model to more
parts of Rust's API. The most obvious next step is std::net
, which is in a very
early state right now, but this is a space we're thinking about for the future! Other
areas that may be interesting include std::env
for information passed in by the
host environment, std::process
for launching sandboxed processes, and anything else
that allows programs to interact with the outside world.
And as we're doing with cap-directories
and cap-tempfile
, we're also interested
in ways that we can do more than just translate the standard library API into a
capability-based model, but also make the capability-based model easy to use.