sunfishcode's blog
A blog by sunfishcode

Is Everything A File?

Posted on

This post is the first in a series about "Everything Is A File":

"Everything Is A File" is one of the most fundamental and widely influential pillars of Unix design philosophy.

It's one of the major sources of simplicity that propelled Unix beyond its predecessor Multics, and helped it achieve widespread popularity. It enabled a small set of simple tools to operate on data from a wide variety of sources.

At the same time, it's also become a major source of complexity in modern computing. And it's been so pervasive for so many decades at this point, that we often don't recognize it as such.

"Everything Is A File" itself has at least three different meanings:

Strictly speaking, Unix itself doesn't perfectly conform to any of these, but it does follow them enough in enough places that they're recognizable and influential. And then there's Plan 9 which took some of these ideas further.

So there's a lot to explore here. This is the first of a series of blog posts exploring different aspects of Everything Is A File, with an overall focus on system interface design.

To get things started, here's a quick exploration of one small but illustrative quirk of Unix.

A tale of two file descriptors

Unix's dup function returns a new file descriptor referring to the same resource as an existing file descriptor. The new file descriptor is independent if the old one; we can close either one and the other remains open.

When file descriptors refer to files, they keep track of a "current position" that tells read and write where in the file to read and write. Since the file descriptors are independent of each other, one might think that each file descriptor would have its own file position, like this:

Two file descriptors pointing to one file, each with their own file position

But what actually happens in Unix is that the file position is shared between the two file descriptors, like this:

Two file descriptors pointing to one file, sharing a file position

If we do an lseek to change the position of one, it changes the position on the other one at the same time. Users of each have to be aware that the way they use one file descriptor might affect the other. And if they have this level of coordination, why did they need to call dup in the first place?

Everything Is A File

Pipes aren't files, but Everything Is A File wants to treat everything in a uniform way, and it would be expensive to make pipes work like files. Files are random-access, and pipes aren't.

It's not that expensive to make files work like pipes though. All it takes is adding this "file position" to open file descriptors. In this way, Unix's read can read from a file just as it can read from a pipe.

But then, we might ask, how should dup work? If we do dup in the natural way for files, each file descriptor would have its own current position. But the equivalent of that for pipes would be expensive to implement; it'd require storing a copy of all data sent through the pipe. So Unix instead says that dup'd file descriptors act like they do on pipes, which means they share a current position.

It makes sense from that perspective. But if we go back and look at it from the perspective of lseek, it doesn't make sense again. lseek doesn't even work on pipes, so why are pipes the thing that determine how lseek has to work?

So, what if...

Underneath this is an "Is A" relationship. Everything Is A file, and an open file Is A stream.

What if we split out a streaming view as a separate entity?

An open file would continue to support all the essential file operations, like fsync, pread, pwrite, and so on, but not read or write. A streaming view would support read and/or write.

We'd then add a new function, which takes an open file and an offset, and returns a streaming view of the file at that offset.

Two file descriptors pointing to one file; one is a stream with a position, and the other is a plain open file

If we had a system like that, the lseek function wouldn't be needed, and we'd avoid this whole question of how lseek and dup interact in surprising ways.

There's more we could say here about ergonomics and efficiency, but for now, this gives a glimpse of a shape for a simpler and more orthogonal system, where files just do one thing and do it well: hold arrays of data, and streams do one thing and do it well: stream data.

Wrap up

We're just getting started here.

Everything Is A File conveys some underlying truths. Being able to write programs that can automatically read from files, pipes, sockets, and more is really powerful. But Everything Is A File also distracts us by pointing us toward the "file" concept as the vehicle for achieving this.

There's lot's more to explore. Follow along for future explorations!