Is Everything A File?
This post is the first in a series about "Everything Is A File":
- Is Everything A File? (this post)
- Measuring System Interface Complexity
- What does Everything Is A File do?
- The Filesystem Namespace
"Everything Is A File" is one of the most fundamental and widely influential pillars of Unix design philosophy.
It's one of the major sources of simplicity that propelled Unix beyond its predecessor Multics, and helped it achieve widespread popularity. It enabled a small set of simple tools to operate on data from a wide variety of sources.
At the same time, it's also become a major source of complexity in modern computing. And it's been so pervasive for so many decades at this point, that we often don't recognize it as such.
"Everything Is A File" itself has at least three different meanings:
- Everything has a name in a hierarchical namespace
- Everything is a file descriptor
- Everything is byte sequences
Strictly speaking, Unix itself doesn't perfectly conform to any of these, but it does follow them enough in enough places that they're recognizable and influential. And then there's Plan 9 which took some of these ideas further.
So there's a lot to explore here. This is the first of a series of blog posts exploring different aspects of Everything Is A File, with an overall focus on system interface design.
To get things started, here's a quick exploration of one small but illustrative quirk of Unix.
A tale of two file descriptors
dup function returns a new file descriptor referring to the same
resource as an existing file descriptor. The new file descriptor is independent
if the old one; we can
close either one and the other remains open.
When file descriptors refer to files, they keep track of a "current position"
write where in the file to read and write. Since the
file descriptors are independent of each other, one might think that each file
descriptor would have its own file position, like this:
But what actually happens in Unix is that the file position is shared between the two file descriptors, like this:
If we do an
lseek to change the position of one, it changes the position
on the other one at the same time. Users of each have to be aware that the way
they use one file descriptor might affect the other. And if they have this level
of coordination, why did they need to call
dup in the first place?
Everything Is A File
Pipes aren't files, but Everything Is A File wants to treat everything in a uniform way, and it would be expensive to make pipes work like files. Files are random-access, and pipes aren't.
It's not that expensive to make files work like pipes though. All it takes
is adding this "file position" to open file descriptors. In this way, Unix's
read can read from a file just as it can read from a pipe.
But then, we might ask, how should
dup work? If we do
dup in the natural
way for files, each file descriptor would have its own current position. But
the equivalent of that for pipes would be expensive to implement; it'd require
storing a copy of all data sent through the pipe. So Unix instead says that
dup'd file descriptors act like they do on pipes, which means they share a
It makes sense from that perspective. But if we go back and look at it from
the perspective of
lseek, it doesn't make sense again.
lseek doesn't even
work on pipes, so why are pipes the thing that determine how
So, what if...
Underneath this is an "Is A" relationship. Everything Is A file, and an open file Is A stream.
What if we split out a streaming view as a separate entity?
An open file would continue to support all the essential file operations, like
pwrite, and so on, but not
write. A streaming
view would support
We'd then add a new function, which takes an open file and an offset, and returns a streaming view of the file at that offset.
If we had a system like that, the
lseek function wouldn't be needed, and we'd
avoid this whole question of how
dup interact in surprising ways.
There's more we could say here about ergonomics and efficiency, but for now, this gives a glimpse of a shape for a simpler and more orthogonal system, where files just do one thing and do it well: hold arrays of data, and streams do one thing and do it well: stream data.
We're just getting started here.
Everything Is A File conveys some underlying truths. Being able to write programs that can automatically read from files, pipes, sockets, and more is really powerful. But Everything Is A File also distracts us by pointing us toward the "file" concept as the vehicle for achieving this.
There's lot's more to explore. Follow along for future explorations!