sunfishcode's blog
A blog by sunfishcode


Errors from `close`

Posted on

A while ago I wrote a blog post about "Bugs in Hello World", about how a lot of programming languages' default way of printing to stdout silently swallow errors.

This led to a repo for maintaining lists of languages that do and don't have this bug, and some examples of how to fix the bug. For example, in C, the fixed Hello World looks like this:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    printf("Hello, World!\n");

    if (fflush(stdout) != 0 || ferror(stdout) != 0) {
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

It's a little more verbose than the hello world code that's usually presented to newcomers, but still manageable.

So that's it, right? Do we now have a completely bug-free version of Hello World?

Well for completeness, we might observe that this version doesn't make any provision for internationalization or accessibility. Not everyone will be able to read its output. That's worth thinking about. However the rest of this blog post is about a different topic, let's set this aside here.

And if we dig further, we notice that this version doesn't call fsync. That means that if its output is redirected to a file, and it runs, and exits, and reports success, then data might still be lost. If the power goes out, then once power is restored and the machine boots up again, users may log in to find corrupted or even entirely missing output. Even though the program reported succeeding. Uh oh!

However, this is where we need to talk about expectations. Simple command-line programs that just read from stdin and write to stdout aren't expected to fsync their output. The reason fsync exists in the first place is so that applications can chose not to call it when they think they don't need to, and skip the overhead.

Simple command-line programs are often just one step of a larger logical program, and it would add useless overhead if every program in a pipeline called fsync. So the expectation is, programs only call fsync when they are operating at a scope where they know it's needed.

For example, something like vim does call fsync when saving a file. If the power goes out, you really want that file you were editing to have all your edits. But something like grep doesn't call fsync when writing its output. If the power goes out halfway through some script, it'll take down the script too. So you can often just rerun the whole script from the beginning. And if the script's output was really important, it can do a sync itself. So it all kind of works out.

Our Hello World program here is a simple command-line program, so it isn't expected to call fsync. So we'll say that's not a bug.

Ok great. That means we really are done, right?

Maybe.

what

There's something of a longstanding debate.

seriously?

So back in the day, when the Network File System (NFS) was first released, it was a triumph of Unix design. The mighty filesystem could abstract over everything, even the network.

But back in the day back in the day, when Unix was being designed, even though disks and tapes and things were slow, CPUs were also really slow. Moore's law was still just getting started. And the Internet didn't exist yet. It would be decades before anyone would start thinking about stuff like C10K.

And in that environment, filesystem APIs were designed to have lots of simple operations that each only did a small amount of work, and each operation blocked until the operation was complete. This made everything very simple.

In the following decades, Unix systems would add various asynchronous I/O APIs, and some of these things do make some things better for some applications in some settings. But most regular applications on Unix platforms continue to use synchronous APIs for most things.

Now, when NFS was introduced, all these little synchronous messages were going over the network to a server, and applications were always waiting for the server to respond over the network to each message before continuing. So NFS had a lot of waiting going on.

To speed things up, the NFS designers introduced an "async" mode. This mode still presents the appearance of synchronous I/O to applications, but under the covers it does some of the work asynchronously. In particular, writes appear to succeed, before the actual I/O is completed on the server. This makes things faster, but also means that if an error occurs when the data is sent, it's too late to have the write fail with an error code. So write errors end up getting deferred, and reported in later write or other calls. That's different from what applications previously expected, but in practice it's close enough that most things basically work.

And to be sure, we're still not talking about use cases that should be doing fsync here. It's not about ensuring the data reaches the actual persistent storage on the server, so that it's safe from power failures. It's just about ensuring that the data reached the server and the server didn't run out of storage space or quota, so that it's safe from being corrupted even when there isn't a power failure.

But there is always one write that isn't followed by another write: the last one. Errors that happen during the last write on a stream would then be reported in the final close call. From an OS perspective, this is fine. close is just a syscall that can fail, like any other syscall.

However, despite it being nice and tidy from an OS perspective, userspace often doesn't check for errors from close in practice. For a lot of different reasons.

For example, if we look at the system calls performed by our Hello World program above, there isn't even a close call:

...
write(1, "Hello, World!\n", 14)         = 14
exit_group(0)                           = ?

If the output of this application is redirected to a file on an NFS filesystem, then that 14 returned from write might just be the NFS subsystem pretending everything is synchronous, despite not having sent the data to the server yet. If the server runs out of storage space, the user runs out of quota, the network gets overloaded, or other failures happen, our hello world program will silently swallow the error. strace won't see it, because it'll happen implicitly when the process exits and the OS cleans up all its open file descriptors.

And to be sure sure, we're still still not talking about applications that are expected to fsync here. If you run out of storage space or quota, your output could be truncated, and other applications could therefore see incorrect output, even if there isn't a power failure.

Is it a bug that we silently swallow errors here? Yes. Silently swallowing errors (unless explicitly silenced) is always a bug.

But whose bug is it?

Who will it be? Who will it be?

From the perspective we've approached it here, it first seems like this is the application's bug. Linux's close documentation agrees, notably observing that "careful programmers" should check for errors from close.

Fingers pointed.

And it certainly is nice and tidy to just end the story there. The designers of Unix intended for us to check errors from all syscalls, and close is a syscall, so we should check errors from it. And in C, failing to check errors from a function is associated with careless programmers, so it's easy to just blame the programmers for any problems that come up.

And there are some programs that do manage to do this. Notably GNU command-line utilities like grep do close their stdio streams. So perhaps everyone should do this, if they wish to think of themselves as "careful programmers".

But it's not so simple. POSIX's own close documentation says that "The close() operation itself need not block awaiting [...] I/O completion". So programs written to the POSIX spec can't rely on getting errors from close anyway.

And in the real world, lots of other real-world applications, written by careful conscientious programmers, don't check for errors from close.

Furthermore, even in applications which do close stdout, it can cause problems. Some debugging tools inject code that uses stdout for output, which can break if stdout gets closed. Some C++ libraries use static destructors that produce output to stdout. These are real-world concerns; for example, at one time, LLVM tools were made to close stdout, to catch errors just as we're discussing here, and it caused so many problems that this code was eventually reverted, and they no longer do.

As an aside, sometimes when this topic is discussed, people bring up the idea that instead of closing stdout, one could just dup it and then close the newly-created file descriptor. However, this isn't guaranteed to work, because file descriptions are reference-counted, so a close of a dup may just decrement the reference count without doing any extra work, so that doesn't seem to be a reliable answer.

So not only do we not tell application programmers that they need to close stdout, they face complicated problems when they do. Is it really their fault if they aren't doing it?

Maybe the real bug is in the design of the system that creates this situation.

Pointing the finger elsewhere

Perhaps we can blame the designers of NFS for inventing this "async" mode and not fully implementing the expected local filesystem semantics. Or blame their users who demanded faster NFS performance. Or blame the Unix designers for selling us on the promise of "everything is a file" and encouraging the computing world to build up an entire ecosystem of software around synchronous I/O.

Or perhaps we can blame system administrators who choose to use NFS.

Or perhaps we can blame the causes of the I/O errors. If a goat chews a network cable, we blame the goat for any data loss that results. When a user uses too much space, we blame the user for using too much space. If those things hadn't happened, none of this would have been a problem.

But as in so many things in life, it's worthwhile to keep a question in mind: is the goal just to find someone to blame? Or is the goal here to make systems more reliable?

FUSE

And besides, it's not just NFS. Or other network filesystems that do similar things.

This problem of errors from close also comes up in FUSE filesystems too. The FUSE documentation does say that FUSE modules shouldn't return errors from close if it's important that applications see them. And on some platforms, these errors aren't even reported.

However nothing stops modules from ignoring this documentation, or in just ignoring platforms other than Linux, so in theory they can also be having this problem.

What if we say that checking for errors from close is optional?

What if we said that it's ok to ignore errors from close in general, and that applications only have to do it if they have a special need to. If they're handling sensitive data, or if they're talking to an unreliable filesystem, then they should check, but otherwise they don't have to?

But the problem is, most programs don't know either of these things. Most programs don't know the context in which their users are using them. They don't know what importance their users assign to their data. And, while it's possible to check for NFS, it's not possible to check for "is this an unreliable filesystem" in general.

Most applications don't have any way to be aware of such things.

Who should be responsible?

When I started this journey, I believed that applications were responsible. That's what major platforms say in their documentation, so it's right by definition. Several years ago, I was the one who made LLVM tools close stdout originally, and defended that code for years as waves of reports of problems with it came in.

But eventually I stopped defending that code, and the problem reports kept coming in, and the code was removed.

Partly as a result of this experience, I came to believe that the platform documentation was misguided. It's too easy. Platform developers can just hide behind the simple stance that close is just another syscall that can fail, and pointedly document that "careful" programmers will check for errors. If users have difficulty doing so, blame the users.

Instead, I came to believe that we needed to renegotiate this relationship. The responsibility should be with some combination of the people who chose to use NFS or FUSE, and the causes of the I/O errors. Users working with NFS should monitor their free space. Administrators should be monitoring their networks and their hardware. And everyone should watch out for goats. Because so many applications don't handle these errors gracefully, users and administrators should be doing this anyway. Users using FUSE modules should be wary of modules that ignore the FUSE API documentation's recommendations. And if they're doing those things anyway, from there, it's not a huge leap to say that users' responsibility to do them.

And besides, network filesystems like NFS are less popular today than they once were. Computer labs have been disappearing in favor of bring-your-own-device. Data is increasingly stored in the cloud rather than on departmental servers. NFS is still out there, but it's far less prevalent than it once was.

And besides besides, programming languages which make async programming more convenient are becoming more popular. If more programs were written in a way that calling write didn't block them from doing other work, then this whole situation might be avoided entirely.

So these days, it's difficult to justify taking a strong stance that programmers everywhere need to take on all the burdens that come with calling close on stdout.

So where are we now?

Platforms still blame applications for not checking errors from close. And these days, platforms have decades of inertia to justify not making any major changes.

A great many applications don't check for errors from close. And they also have decades of inertia too.

Neither side can be easily fixed.

So mostly, this issue just persists through time. When problems in practice do happen, there are usually other things around that can be blamed. I'm sorry your data got silently corrupted. It's those goats. Eating our network cables. You know how goats are 😉.

What I can say at this point is that I personally am not going to embark on a quest to get application programmers to check for errors from close.

And so, my conclusion here is that, no, our hello world program does not have a bug. It's fine. It's just fine.