Errors from `close`
Posted on
A while ago I wrote a blog post about "Bugs in Hello World", about how a lot of programming languages' default way of printing to stdout silently swallow errors.
This led to a repo for maintaining lists of languages that do and don't have this bug, and some examples of how to fix the bug. For example, in C, the fixed Hello World looks like this:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
printf("Hello, World!\n");
if (fflush(stdout) != 0 || ferror(stdout) != 0) {
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
It's a little more verbose than the hello world code that's usually presented to newcomers, but still manageable.
So that's it, right? Do we now have a completely bug-free version of Hello World?
Well for completeness, we might observe that this version doesn't make any provision for internationalization or accessibility. Not everyone will be able to read its output. That's worth thinking about. However the rest of this blog post is about a different topic, let's set this aside here.
And if we dig further, we notice that this version doesn't call fsync
.
That means that if its output is redirected to a file, and it runs, and exits, and reports
success, then data might still be lost. If the power goes out, then once power is restored
and the machine boots up again, users may log in to find corrupted or even entirely missing
output. Even though the program reported succeeding. Uh oh!
However, this is where we need to talk about expectations. Simple command-line
programs that just read from stdin and write to stdout aren't expected to
fsync
their output. The reason fsync
exists in the first place is so that
applications can chose not to call it when they think they don't need to, and
skip the overhead.
Simple command-line programs are often just one step of a larger logical program,
and it would add useless overhead if every program in a pipeline called fsync
.
So the expectation is, programs only call fsync
when they are operating at a
scope where they know it's needed.
For example, something like vim
does call fsync
when saving a file. If the
power goes out, you really want that file you were editing to have all your edits.
But something like grep
doesn't call fsync
when writing its output. If the
power goes out halfway through some script, it'll take down the script too.
So you can often just rerun the whole script from the beginning. And if the
script's output was really important, it can do a sync
itself. So it all kind
of works out.
Our Hello World program here is a simple command-line program, so it isn't expected
to call fsync
. So we'll say that's not a bug.
Ok great. That means we really are done, right?
Maybe.
what
There's something of a longstanding debate.
seriously?
So back in the day, when the Network File System (NFS) was first released, it was a triumph of Unix design. The mighty filesystem could abstract over everything, even the network.
But back in the day back in the day, when Unix was being designed, even though disks and tapes and things were slow, CPUs were also really slow. Moore's law was still just getting started. And the Internet didn't exist yet. It would be decades before anyone would start thinking about stuff like C10K.
And in that environment, filesystem APIs were designed to have lots of simple operations that each only did a small amount of work, and each operation blocked until the operation was complete. This made everything very simple.
In the following decades, Unix systems would add various asynchronous I/O APIs, and some of these things do make some things better for some applications in some settings. But most regular applications on Unix platforms continue to use synchronous APIs for most things.
Now, when NFS was introduced, all these little synchronous messages were going over the network to a server, and applications were always waiting for the server to respond over the network to each message before continuing. So NFS had a lot of waiting going on.
To speed things up, the NFS designers introduced an "async" mode. This mode still presents
the appearance of synchronous I/O to applications, but under the covers it does some of the
work asynchronously. In particular, write
s appear to succeed, before the actual I/O is
completed on the server. This makes things faster, but also means that if an error occurs when
the data is sent, it's too late to have the write
fail with an error code. So write errors
end up getting deferred, and reported in later write
or other calls. That's different from
what applications previously expected, but in practice it's close enough that most things
basically work.
And to be sure, we're still not talking about use cases that should be doing fsync
here.
It's not about ensuring the data reaches the actual persistent storage on the server, so that
it's safe from power failures. It's just about ensuring that the data reached the server and
the server didn't run out of storage space or quota, so that it's safe from being corrupted
even when there isn't a power failure.
But there is always one write
that isn't followed by another write
: the last one. Errors
that happen during the last write
on a stream would then be reported in the final close
call. From an OS perspective, this is fine. close
is just a syscall that can fail, like any
other syscall.
However, despite it being nice and tidy from an OS perspective, userspace often doesn't check
for errors from close
in practice. For a lot of different reasons.
For example, if we look at the system calls performed by our Hello World program above, there isn't
even a close
call:
...
write(1, "Hello, World!\n", 14) = 14
exit_group(0) = ?
If the output of this application is redirected to a file on an NFS filesystem, then
that 14
returned from write
might just be the NFS subsystem pretending everything is
synchronous, despite not having sent the data to the server yet. If the server runs out of
storage space, the user runs out of quota, the network gets overloaded, or
other failures happen, our hello world program will silently swallow the error. strace
won't see it, because it'll happen implicitly when the process exits and the OS cleans up
all its open file descriptors.
And to be sure sure, we're still still not talking about applications that are expected
to fsync
here. If you run out of storage space or quota, your output could be truncated,
and other applications could therefore see incorrect output, even if there isn't a power
failure.
Is it a bug that we silently swallow errors here? Yes. Silently swallowing errors (unless explicitly silenced) is always a bug.
But whose bug is it?
Who will it be? Who will it be?
From the perspective we've approached it here, it first seems like this is the
application's bug. Linux's close
documentation agrees, notably observing that
"careful programmers" should check for errors from close
.
Fingers pointed.
And it certainly is nice and tidy to just end the story there. The designers of Unix
intended for us to check errors from all syscalls, and close
is a syscall, so we
should check errors from it. And in C, failing to check errors from a function is
associated with careless programmers, so it's easy to just blame the programmers for
any problems that come up.
And there are some programs that do manage to do this. Notably GNU command-line utilities
like grep
do close their stdio streams. So perhaps everyone should do this, if they
wish to think of themselves as "careful programmers".
But it's not so simple. POSIX's own close
documentation says that
"The close() operation itself need not block awaiting [...] I/O completion". So
programs written to the POSIX spec can't rely on getting errors from close
anyway.
And in the real world, lots of other real-world applications, written by careful
conscientious programmers, don't check for errors from close
.
Furthermore, even in applications which do close stdout
, it can cause problems. Some
debugging tools inject code that uses stdout
for output, which can break if stdout
gets closed. Some C++ libraries use static destructors that produce output to stdout
.
These are real-world concerns; for example, at one time, LLVM tools were made to close
stdout
, to catch errors just as we're discussing here, and it caused so many problems
that this code was eventually reverted, and they no longer do.
As an aside, sometimes when this topic is discussed, people bring up the idea that
instead of closing stdout, one could just dup
it and then close the newly-created
file descriptor. However, this isn't guaranteed to work, because file descriptions
are reference-counted, so a close
of a dup
may just decrement the reference
count without doing any extra work, so that doesn't seem to be a reliable answer.
So not only do we not tell application programmers that they need to close stdout, they face complicated problems when they do. Is it really their fault if they aren't doing it?
Maybe the real bug is in the design of the system that creates this situation.
Pointing the finger elsewhere
Perhaps we can blame the designers of NFS for inventing this "async" mode and not fully implementing the expected local filesystem semantics. Or blame their users who demanded faster NFS performance. Or blame the Unix designers for selling us on the promise of "everything is a file" and encouraging the computing world to build up an entire ecosystem of software around synchronous I/O.
Or perhaps we can blame system administrators who choose to use NFS.
Or perhaps we can blame the causes of the I/O errors. If a goat chews a network cable, we blame the goat for any data loss that results. When a user uses too much space, we blame the user for using too much space. If those things hadn't happened, none of this would have been a problem.
But as in so many things in life, it's worthwhile to keep a question in mind: is the goal just to find someone to blame? Or is the goal here to make systems more reliable?
FUSE
And besides, it's not just NFS. Or other network filesystems that do similar things.
This problem of errors from close
also comes up in FUSE filesystems too. The FUSE documentation does
say that FUSE modules shouldn't return errors from close
if it's important that
applications see them. And on some platforms, these errors aren't even reported.
However nothing stops modules from ignoring this documentation, or in just ignoring platforms other than Linux, so in theory they can also be having this problem.
What if we say that checking for errors from close is optional?
What if we said that it's ok to ignore errors from close in general, and that applications only have to do it if they have a special need to. If they're handling sensitive data, or if they're talking to an unreliable filesystem, then they should check, but otherwise they don't have to?
But the problem is, most programs don't know either of these things. Most programs don't know the context in which their users are using them. They don't know what importance their users assign to their data. And, while it's possible to check for NFS, it's not possible to check for "is this an unreliable filesystem" in general.
Most applications don't have any way to be aware of such things.
Who should be responsible?
When I started this journey, I believed that applications were responsible. That's what
major platforms say in their documentation, so it's right by definition. Several years ago,
I was the one who made LLVM tools close stdout
originally, and defended that code for
years as waves of reports of problems with it came in.
But eventually I stopped defending that code, and the problem reports kept coming in, and the code was removed.
Partly as a result of this experience, I came to believe that the platform documentation
was misguided. It's too easy. Platform developers can just hide behind the simple
stance that close
is just another syscall that can fail, and pointedly document that
"careful" programmers will check for errors. If users have difficulty doing so, blame the
users.
Instead, I came to believe that we needed to renegotiate this relationship. The responsibility should be with some combination of the people who chose to use NFS or FUSE, and the causes of the I/O errors. Users working with NFS should monitor their free space. Administrators should be monitoring their networks and their hardware. And everyone should watch out for goats. Because so many applications don't handle these errors gracefully, users and administrators should be doing this anyway. Users using FUSE modules should be wary of modules that ignore the FUSE API documentation's recommendations. And if they're doing those things anyway, from there, it's not a huge leap to say that users' responsibility to do them.
And besides, network filesystems like NFS are less popular today than they once were. Computer labs have been disappearing in favor of bring-your-own-device. Data is increasingly stored in the cloud rather than on departmental servers. NFS is still out there, but it's far less prevalent than it once was.
And besides besides, programming languages which make async programming more convenient
are becoming more popular. If more programs were written in a way that calling write
didn't block them from doing other work, then this whole situation might be avoided entirely.
So these days, it's difficult to justify taking a strong stance that programmers
everywhere need to take on all the burdens that come with calling close
on stdout.
So where are we now?
Platforms still blame applications for not checking errors from close
. And these
days, platforms have decades of inertia to justify not making any major changes.
A great many applications don't check for errors from close
. And they also have
decades of inertia too.
Neither side can be easily fixed.
So mostly, this issue just persists through time. When problems in practice do happen, there are usually other things around that can be blamed. I'm sorry your data got silently corrupted. It's those goats. Eating our network cables. You know how goats are 😉.
What I can say at this point is that I personally am not going to embark on a quest
to get application programmers to check for errors from close
.
And so, my conclusion here is that, no, our hello world program does not have a bug. It's fine. It's just fine.