Aapeli Vuorinen

Information through the lens of topology

I like to often think about the organisation of information by drawing some parallels to point-set topology. You can think of topology as roughly concerned with a set of objects (points), and defining which objects are “close” to each other, by being “neighbours”. (In continuous spaces, it’s more about the way those neighbourhoods are generated and hence the structure of the space, rather than which two objects are concretely in the same neighbourhood. They allow you to do things like define limits without explicitly discussing distances and so forth. Spiffy stuff.)

There are two completely useless topologies, mathematically speaking. The first is the trivial topology in which everything is just in one set (think of it as a “basket”); the other is the discrete topology in which each object is in its own set. Those are both kind of useless: if you want to find the thing you care about, you either have to sift through a tonne of objects, or a tonne of baskets with one object each. Thinking of it in another way, they don’t tell us anything about the structure of what you’re working in, everything’s exactly the same. In between, there’s a spectrum of topologies where you divide objects into more meaningful hierarchies, and those are normally the topologies you’d care about as they have some form of interesting structure or relationships.

This is important in information retrieval: basically, either of these two useless topologies gives you a linear time search at best. In between you can imagine some kind of topology where at each level (or basket), you divide the objects into two equally large sets. That would give you some kind of binary search, assuming you always know which of the two halves the thing you want to find lives in. That’s maybe the best for finding things quickly.

That’s all good for computers which don’t like to do linear search though. But I think people work differently. If you try to recall something, I think it’s more of a flat space, rather than some kind of hierarchy in your mind. Sure, you can accomodate a hierarchy, but you don’t really have any problem recalling flat things either. Or at least your mind does it automatically for you, and enforcing some fixed hierarchy makes your slower at it. Try to remember the last time you ate pizza and the person in whose presence you did so: you’ll probably enumerate some groups of people or events that will then lead to you remembering that instance and the person with you. If you had to instead first imagine the people you share food with, then try to recall in which of those cases you ate pizza, you’ll be much slower. This is why I think we should let our brains create the hierarchies and as developers, not impose our own laws about what belongs in which bucket.

Computers as state machines

This brings us back to computers. One way to think about computers is as some kind of state machine that gets mutated by operations you issue. Let’s for a moment ignore the other things that cause computers to do stuff, and just think about how you use it when you want to perform a task on it and move it from one state to another using operations. Those operations might be queries for data (find a file), or they might be commands to change the state (set text to bold).


If you think about the command line interface, it’s about as close to a flat topology as you can get. It’s just there and you can launch any command you can remember by typing a few words. Sure, the commands themselves might have flags or switches or subcommands, but those also reside in your brain, and bar command completion, you’re mostly on your own.

Contrast this then to a Graphical User Interface, like Microsoft Office, Adobe Photoshop, or other commonplace desktop apps. These tend to be full of menus and dialogs, hidden behind even more menus and dialogs. These are very appropriate for certain operations and certain user bases. For example, if you use the program rarely and aren’t so fast at typing, it’s probably the case that a well thought out menu system is much more appropriate than writing a command line tool. Some other programs obviously are hard to use without the visual point-and-clickiness of a GUI, such as an image editing program.

I’m strongly of the opinion that if you use a program frequently and you use programs that don’t inherently require a GUI, then a command line/textual interface is by far superior to a graphical one. This is why I prefer using Git through the terminal instead of some menu-driven graphical app, and why I love writing Markdown or LaTeX instead of using Word.

I also think that seeing information and data through the lens of topology is a cute idea and can give one a fair bit of insight into information retrieval.