sunfishcode's blog
A blog by sunfishcode


I/O safety and speed: Why not both?

Posted on

Posish: I/O Safety in practice

The I/O Safety RFC is now merged! But it's all fairly abstract, so what will this look like in practice with real APIs?

One of the ways is posish, the fastest POSIX-ish, Unix-ish, Linux-ish, and libc-ish API for Rust!

And it's also I/O-safe! And memory-safe! These are actually the bigger motivators for this crate, and it helped guide the development of the I/O safety APIs. But it's also fast, which is fun, and shows that Rust's "abstraction without overhead" is preserved. More on that later though. First, what's all this about safety, you say? I'm glad you asked 😉!

Safety

Memory safety is of course an established concept for Rust. Memory-safe abstractions for POSIX/libc functions is a well-established idea, with Rust references, slices and return values in place of raw pointers, and posish provides all this as well.

I/O safety is a newly-introduced concept and is about treating file descriptors similar to pointers, with concepts of ownership and borrowing. See the I/O Safety RFC and the io-lifetimes crate for details. Posish uses io-lifetimes types for all its APIs that work with file descriptors.

A tour of a function in posish

Let's take a quick look at the openat function, which demonstrates several of the things posish does.

The libc version of openat looks like this:

    pub unsafe extern "C" fn openat(
        dirfd: c_int,
        pathname: *const c_char,
        flags: c_int,
         ...
    ) -> c_int

Posish's looks like this:

    pub fn openat<P: Arg, Fd: AsFd>(
        dirfd: &Fd,
        path: P,
        oflags: OFlags,
        create_mode: Mode
    ) -> Result<OwnedFd>

Here's a breakdown of the differences:


    pub fn openat<P: Arg, Fd: AsFd>(
        dirfd: &Fd,
        path: P,
        oflags: OFlags,
        create_mode: Mode
    ) -> Result<OwnedFd>

Posish's dirfd accepts any type which implements the AsFd trait, so you can pass in a &File or &TcpStream or other things conveniently and safely instead of having to call .as_raw_fd() and pass in the result. AsFd works like a borrow, and similar to Rust references, it's prevented from escaping and dangling.

AsFd is similar to AsRawFd, except that AsFd::as_fd returns a BorrowedFd instead of AsRawFd::as_raw_fd's RawFd, which is an integer, it returns a BorrowedFd, which is a special type scoped to the lifetime of the borrow.


    pub fn openat<P: Arg, Fd: AsFd>(
        dirfd: &Fd,
        path: P,
        oflags: OFlags,
        create_mode: Mode
    ) -> Result<OwnedFd>

Similar to nix, posish's path is also generic and accepts any type of string that implements the Arg trait, which includes &str, &Path, &CStr, &OsStr, and other string-like types in Rust's standard library. Actual POSIX-ish system calls expect NUL-terminated strings, so there are optimized paths that avoid dynamic allocation in most cases where you pass in something other than a &CStr. And posish avoids making any assumptions about PATH_MAX, so the lengths of strings you can use is between you and the individual syscalls.


    pub fn openat<P: Arg, Fd: AsFd>(
        dirfd: &Fd,
        path: P,
        oflags: OFlags,
        create_mode: Mode
    ) -> Result<OwnedFd>

Similar to other libc-wrapping creates, posish's oflags argument takes an OFlags, which uses the bitflags crate to create a type-checked flags type instead of a raw integer type. Posish uses an explicit create_mode argument instead of ... so it avoids the unsafety of varargs. And posish reports errors via a Result instead of a plain c_int, and returns a Error inside the Result instead of setting errno.


    pub fn openat<P: Arg, Fd: AsFd>(
        dirfd: &Fd,
        path: P,
        oflags: OFlags,
        create_mode: Mode
    ) -> Result<OwnedFd>

On success, posish returns an OwnedFd instead of a raw file descriptor, modeling the file descriptor's lifetime in the type system, so similar to a Rust Box<T> or other owning type, it's prevented from dangling.

OwnedFd is very similar to std::fs::File. From a user perspective, the main difference between OwnedFd and File is that OwnedFd doesn't suggest any file-like purpose or behavior, it's just a generic owned file descriptor for any kind of resource. Most use cases will likely want to wrap OwnedFd in higher-level types. When OwnedFd is added to the standard library, I expect File itself will be implemented in terms of OwnedFd in this way as well.


    pub fn openat<P: Arg, Fd: AsFd>(
        dirfd: &Fd,
        path: P,
        oflags: OFlags,
        create_mode: Mode
    ) -> Result<OwnedFd>

And finally, the openat system call itself is documented to not have any other side effects, so it has first-class I/O, though at a fairly coarse granularity since the path can contain .. components and can thereby reference any path in the filesystem (if you're worried about that kind of thing, see cap-std).

So with all these together, posish's openat is a safe function, both in memory safety and I/O safety.

Cool cool what about speed though?

On Linux on x86-64, x86, and aarch64, posish uses avoids libc entirely, avoiding errno and pthread cancellation checking. On stable Rust it uses out-of-line asm to perform syscalls; on Rust nightly, it uses inline asm and, since it adds very little code itself, it can fully inline the syscall instructions into the user callsites. On all other platforms, it currently uses libc and errno, but it's factored to facilitate additional backend implementations in the future.

It also uses an optimized error type and the string argument optimization mentioned above. In microbenchmarks, at this moment in time, filesystem syscalls with string path arguments are about 15% faster than nix, and about 3% faster than libc with allocating temporary CStrings. And it uses the vDSO to make clock_gettime really fast on Linux (and to avoid using the super-slow int 0x80 mechanism on x86), just like Linux libc implementations do, but without depending on libc.

That said, the bigger picture is that most of the time for syscalls is spent in the OS itself, and posish makes most syscalls only a few percentage points faster at most. But these days, with system calls often getting slower, it's still fun to have some gains in performance in this space, even if they are relatively small. And it demonstrates that the new abstraction of I/O safety doesn't entail new overhead.

Is that all?

Posish provides a few additional niceties, including:

What about portability?

Posish isn't a fully-fledged portability layer; it doesn't support Windows and some of its APIs are OS-specific, and even OS-version-specific. Posish doesn't do non-trivial emulation of features, and prefers to let higher-level layers provide that kind of portability when needed.

Many POSIX-ish APIs are very low-level, and not a suitable abstraction layer for efficiently implementing on different kinds of platforms. See Rust's standard library for an illustration of this: For example, std::fs::Metadata is more abstract than struct stat, and this gives platforms more options when implementing functions like std::fs::File::metadata.

Other interesting cases are epoll and io_uring. Posish doesn't yet have APIs for these, and it's an open question whether it's worth it. These APIs are very low-level, they have complex file descriptor ownership, and they are most often wrapped in higher-level APIs. It's not yet clear whether I/O safety makes sense at the abstraction level of these APIs, or whether they should just continue to use RawFd and introduce safety at higher levels of abstraction.

What about close?

It's called drop 😃.

Posish doesn't currently have a close function in its public API. It would be straight-forward to add it, and the signature would either have an OwnedFd argument or a T: IntoFd generic argument, to express that the file descriptor ownership is being consumed. However it'd also be redundant for most users. OwnedFd calls close in its Drop implementation, so in most cases, the way to close a file descriptor is to simply drop it.

(Drop doesn't have a way to report errors, so this may not be the last word on this subject, but this is a separate and much more complex topic involving things like NFS in async mode.)

How complete is posish?

It has everything that cap-std and wasmtime's WASI implementation need, as well as a few other projects, though it doesn't have everything. If you find it's missing something you need, please file an issue!

Where does I/O safety go from here?

Now that the I/O Safety RFC is merged, my next step in this wing of the story is to prepare a PR adding OwnedFd, BorrowedFd, and friends to the Rust standard library. Once that's done, I'll update the io-lifetimes crate to use the new standard library types and traits instead of defining them itself.

io-lifetimes defines more functionality than will be in the initial standard library PR, such as views, from_into_fd, and the AsFilelike/AsSocketlike Windows/Unix portability layer, so it'll remain useful for some time, but it'll get thinner as the standard library takes over more functionality.

And for the Rust ecosystem as a whole, the I/O safety RFC outlines the path forward.