I/O safety and speed: Why not both?
Posted on
Rustix: I/O Safety in practice
The I/O Safety RFC is now merged! But it's all fairly abstract, so what will this look like in practice with real APIs?
One of the ways is rustix, the fastest POSIX-ish, Unix-ish, Linux-ish, and libc-ish API for Rust!
And it's also I/O-safe! And memory-safe! These are actually the bigger motivators for this crate, and it helped guide the development of the I/O safety APIs. But it's also fast, which is fun, and shows that Rust's "abstraction without overhead" is preserved. More on that later though. First, what's all this about safety, you say? I'm glad you asked 😉!
Safety
Memory safety is of course an established concept for Rust. Memory-safe abstractions for POSIX/libc functions is a well-established idea, with Rust references, slices and return values in place of raw pointers, and rustix provides all this as well.
I/O safety is a newly-introduced concept and is about treating file descriptors similar to pointers, with concepts of ownership and borrowing. See the I/O Safety RFC and the io-lifetimes crate for details. Rustix uses io-lifetimes types for all its APIs that work with file descriptors.
A tour of a function in rustix
Let's take a quick look at the openat
function, which demonstrates several
of the things rustix does.
The libc version of openat
looks like this:
pub unsafe extern "C" fn openat(
dirfd: c_int,
pathname: *const c_char,
flags: c_int,
...
) -> c_int
Rustix's looks like this:
pub fn openat<P: Arg, Fd: AsFd>(
dirfd: &Fd,
path: P,
oflags: OFlags,
create_mode: Mode
) -> Result<OwnedFd>
Here's a breakdown of the differences:
pub fn openat<P: Arg, Fd: AsFd>(
dirfd: &Fd,
path: P,
oflags: OFlags,
create_mode: Mode
) -> Result<OwnedFd>
Rustix's dirfd
accepts any type which implements the AsFd
trait, so you
can pass in a &File
or &TcpStream
or other things conveniently and safely
instead of having to call .as_raw_fd()
and pass in the result. AsFd
works like a borrow, and similar to Rust references, it's prevented from
escaping and dangling.
AsFd
is similar to AsRawFd
, except that AsFd::as_fd
returns a
BorrowedFd
instead of AsRawFd::as_raw_fd
's RawFd
, which is an
integer, it returns a BorrowedFd
, which is a special type scoped to the
lifetime of the borrow.
pub fn openat<P: Arg, Fd: AsFd>(
dirfd: &Fd,
path: P,
oflags: OFlags,
create_mode: Mode
) -> Result<OwnedFd>
Similar to nix, rustix's path
is also generic and accepts any type of
string that implements the Arg
trait, which includes &str
, &Path
,
&CStr
, &OsStr
, and other string-like types in Rust's standard library.
Actual POSIX-ish system calls expect NUL-terminated strings, so there are
optimized paths that avoid dynamic allocation in most cases where you pass
in something other than a &CStr
. And rustix avoids making any assumptions
about PATH_MAX
, so the lengths of strings you can use is between you and
the individual syscalls.
pub fn openat<P: Arg, Fd: AsFd>(
dirfd: &Fd,
path: P,
oflags: OFlags,
create_mode: Mode
) -> Result<OwnedFd>
Similar to other libc-wrapping creates, rustix's oflags
argument takes an
OFlags
, which uses the bitflags
crate to create a type-checked flags
type instead of a raw integer type. Rustix uses an explicit create_mode
argument instead of ...
so it avoids the unsafety of varargs. And rustix
reports errors via a Result
instead of a plain c_int
, and returns a
Error
inside the Result
instead of setting errno
.
pub fn openat<P: Arg, Fd: AsFd>(
dirfd: &Fd,
path: P,
oflags: OFlags,
create_mode: Mode
) -> Result<OwnedFd>
On success, rustix returns an OwnedFd
instead of a raw file descriptor,
modeling the file descriptor's lifetime in the type system, so similar to
a Rust Box<T>
or other owning type, it's prevented from dangling.
OwnedFd
is very similar to std::fs::File
. From a user perspective, the
main difference between OwnedFd
and File
is that OwnedFd
doesn't suggest
any file-like purpose or behavior, it's just a generic owned file descriptor
for any kind of resource. Most use cases will likely want to wrap OwnedFd
in
higher-level types. When OwnedFd
is added to the standard library, I expect
File
itself will be implemented in terms of OwnedFd
in this way as well.
pub fn openat<P: Arg, Fd: AsFd>(
dirfd: &Fd,
path: P,
oflags: OFlags,
create_mode: Mode
) -> Result<OwnedFd>
And finally, the openat
system call itself is documented to not have any
other side effects, so it has first-class I/O, though at a fairly coarse
granularity since the path
can contain ..
components and can thereby
reference any path in the filesystem (if you're worried about that kind of
thing, see cap-std).
So with all these together, rustix's openat
is a safe function, both in
memory safety and I/O safety.
Cool cool what about speed though?
On Linux on x86-64, x86, and aarch64, rustix uses avoids libc
entirely,
avoiding errno
and pthread cancellation checking. On stable Rust it uses
out-of-line asm to perform syscalls; on Rust nightly, it uses inline asm and,
since it adds very little code itself, it can fully inline the syscall
instructions into the user callsites. On all other platforms, it currently
uses libc
and errno
, but it's factored to facilitate additional backend
implementations in the future.
It also uses an optimized error type and the string argument optimization
mentioned above. In microbenchmarks, at this moment in time, filesystem
syscalls with string path arguments are about 15% faster than nix, and about
3% faster than libc with allocating temporary CString
s. And it uses the
vDSO
to make clock_gettime
really fast on Linux (and to avoid using the
super-slow int 0x80
mechanism on x86), just like Linux libc implementations
do, but without depending on libc.
That said, the bigger picture is that most of the time for syscalls is spent in the OS itself, and rustix makes most syscalls only a few percentage points faster at most. But these days, with system calls often getting slower, it's still fun to have some gains in performance in this space, even if they are relatively small. And it demonstrates that the new abstraction of I/O safety doesn't entail new overhead.
Is that all?
Rustix provides a few additional niceties, including:
-
It always uses 64-bit file offsets, so users don't need to juggle
off64_t
vs.off_t
, or remember to passO_LARGEFILE
when needed. -
On 32-bit Linux platforms that use direct Linux syscalls, it uses a 64-bit
time_t
type when the underlying kernel supports it, so clocks won't wrap around in the year 2038. -
And as a fun bonus, some names have been made more human-friendly and less historic-accidental, with names such as
accept_with
anddup2_with
to take extra flags arguments instead ofaccept4
anddup3
, andseek
instead oflseek
. The code also uses doc aliases so that you can still find things via the traditional C names.
What about portability?
Rustix isn't a fully-fledged portability layer; it doesn't support Windows and some of its APIs are OS-specific, and even OS-version-specific. Rustix doesn't do non-trivial emulation of features, and prefers to let higher-level layers provide that kind of portability when needed.
Many POSIX-ish APIs are very low-level, and not a suitable abstraction layer
for efficiently implementing on different kinds of platforms. See Rust's
standard library for an illustration of this: For example,
std::fs::Metadata
is more abstract than struct stat
, and this gives
platforms more options when implementing functions like
std::fs::File::metadata
.
Other interesting cases are epoll
and io_uring
. Rustix doesn't yet have
APIs for these, and it's an open question whether it's worth it. These APIs are
very low-level, they have complex file descriptor ownership, and they are most
often wrapped in higher-level APIs. It's not yet clear whether I/O safety makes
sense at the abstraction level of these APIs, or whether they should just
continue to use RawFd
and introduce safety at higher levels of abstraction.
What about close
?
It's called drop
😃.
Rustix doesn't currently have a close
function in its public API. It would be
straight-forward to add it, and the signature would either have an OwnedFd
argument or a T: IntoFd
generic argument, to express that the file
descriptor ownership is being consumed. However it'd also be redundant for most
users. OwnedFd
calls close
in its Drop
implementation, so in most cases,
the way to close a file descriptor is to simply drop it.
(Drop
doesn't have a way to report errors, so this may not be the last word
on this subject, but this is a separate and much more complex topic involving
things like NFS in async mode.)
How complete is rustix?
It has everything that cap-std and wasmtime's WASI implementation need, as well as a few other projects, though it doesn't have everything. If you find it's missing something you need, please file an issue!
Where does I/O safety go from here?
Now that the I/O Safety RFC is merged, my next step in this wing of the story
is to prepare a PR adding OwnedFd
, BorrowedFd
, and friends to the Rust
standard library. Once that's done, I'll update the io-lifetimes crate to use
the new standard library types and traits instead of defining them itself.
io-lifetimes defines more functionality than will be in the initial standard
library PR, such as views, from_into_fd
, and the
AsFilelike
/AsSocketlike
Windows/Unix portability layer, so it'll remain
useful for some time, but it'll get thinner as the standard library takes over
more functionality.
And for the Rust ecosystem as a whole, the I/O safety RFC outlines the path forward.