Two Kinds of Programs: Closed Worlds and Open Worlds

Over the years I’ve found that I write two distinct kinds of software:

Programs that solve equations, and
Programs that move information around

The first category includes my work in astrophysics and plasma physics, along with the probabilistic models we build at Sturdy Statistics. The second category includes the infrastructure around those models: APIs, web services, data ingestion pipelines, and analysis platforms.

Though it’s obvious with hindsight, it took me a while to realize that these two kinds of systems live under very different constraints, and they reward entirely different programming styles.

Equation solvers are closed worlds

When solving physical equations (extended magnetohydrodynamics in my astrophysics research) or statistical ones (probabilistic graphical models at Sturdy Statistics), the program is essentially an executable version of a mathematical theory. In such systems:

the equations are known,
the variables are precisely defined, and
the relationships between quantities are fixed.

In other words, the model’s structure is predetermined. The program exists to faithfully implement a set of mathematical relationships. As a result, formal correctness is everything.

Mistakes in this sort of code are subtle and very difficult to find. A wrong index or an incorrect normalization constant won’t trigger compiler warnings or static checks. Moreover, the calculations are often too complex, interconnected, and (formally) chaotic to allow for meaningful unit tests. Since these are research topics, your code is sui generis: there is no other way to solve the equation than the one you’re building. If your code does not work, you have nothing to compare it to. And to top it off, incorrect results frequently appear perfectly plausible.

If you’re lucky, a bug produces NaNs or unstable convergence. More often it simply produces numbers that look reasonable but are wrong. Debugging these systems can be painstaking. It often means tracing numerical behavior through hundreds or thousands of iterations and checking results with pen and paper. It is not for the faint of heart.

In one case I’d prefer to forget, an error that subtly broke detailed balance took months to track down. The mistake was so subtle that I’m still amazed I caught it. Even so, it rendered the entire model useless.

This is where closed-world reasoning shines. When the structure of the system is fixed, you want to enforce as many invariants as possible and constrain the code tightly around the mathematical model. In my experience, the most important thing in these programs is discipline: clear structures, explicit invariants, and assertions everywhere. Once the implementation matches the mathematics, the program becomes a faithful machine for executing the theory.

Web systems are open worlds

Web services live in a completely different universe.

They deal with HTTP requests and responses, JSON payloads, database rows, user input, logs, and third-party APIs. Rather than implementing a fixed theory, these systems are managing information in a constantly evolving environment.

In these systems:

schemas change
fields appear and disappear
integrations evolve
user behavior is unpredictable

The program must accommodate data whose structure is only partially known, and which will inevitably change over time. In this environment, flexibility is paramount. Data structures that can grow and evolve naturally tend to work well here. The system must be able to ingest new information, adapt to changing schemas, and tolerate incomplete or messy inputs.

Fortunately, failures in these systems are usually much easier to diagnose. When a web service fails, the problem is not a deep conceptual mystery because you can see it: you have something like a stacktrace or a database inconsistency. Debugging involves inspecting data rather than tracing intricate numerical behavior.

Unlike a subtle “detailed balance” error, you aren’t left staring at plausible-but-wrong numbers, wondering what on earth is happening. Debugging web infrastructure isn’t solving a profound mystery; it’s a matter of diligently doing the work. By tracing a request through structured logs and inspecting the state at each boundary, you can almost always find exactly where the data was lost. The answer is in there, waiting to be found.

The Boundary Between Worlds

Navigating the boundary where these two worlds meet is our core business at Sturdy Statistics, and it’s how we convert unstructured data into processable entities.

Our customers hand us raw information, and typically, neither we nor the customer knows exactly what shape that data will take. That is why they hire us. Our infrastructure has to live entirely in the open world: it must be endlessly flexible, capable of automatically parsing, coercing, and structuring messy, unknown inputs.

How do we cross that boundary from open-world, open-ended ingestion to closed-world, tightly structured equation solving? It’s closely related to how we automatically surface structure from unstructured data.

Bayesian models occupy an interesting middle ground between open- and closed-world paradigms. As software, Bayesian models are extremely rigid. You define a graph structure, specify probability distributions, and implement inference algorithms that are not so different from the equation solvers used in physics. (In fact, many of them originated in physics.)

But philosophically, the model is built around a very different idea: we do not know the true structure of the world. Bayesian models are unusual in that they explicitly represent uncertainty within a closed (mathematically certain) system. While the model represents a concrete theory about the data, the Bayesian framework acknowledges that we cannot know the truth about the data. Instead of producing a single deterministic answer, the model integrates over many possible explanations consistent with the data. It is perhaps surprising that this flexibility requires exceptionally rigid software.

This means that Bayesian modeling combines a closed software structure with an open view of reality. The program itself is rigid, but the mathematics explicitly accounts for uncertainty about the world it describes.

Embracing the Open World

When I first started building web infrastructure, I struggled because I was trying to apply the habits I had learned writing closed-world physics simulations. I learned a lot by trial and error, but the shift didn’t “click” for me until I encountered Rich Hickey’s design philosophy.

Hickey’s designs in Clojure and Datomic are remarkably insightful for open-world problems precisely because they acknowledge that the full structure of your data is fundamentally unknowable. When you receive a payload, treating it as an open map rather than trying to coerce it into a rigid object hierarchy allows your system to be resilient to change. You operate only on the keys you care about and either strip the rest or pass it along.

Similarly, this mindset completely changes how you handle the hardest bugs in web infrastructure: concurrency and distributed state. In an open world, state isn’t a single variable to be locked and mutated. Tools like core.async acknowledge that you are often just coordinating independent, asynchronous flows of information. Using channels to manage these flows – rather than relying on shared, mutable memory – has proven amazingly useful in our real-world applications for orchestrating unrelated tasks.

This philosophy extends directly to another major theme of our work: web security. In an open world, we do not control our inputs. We have to handle data that is messy, incomplete, malformed, and frequently actively adversarial. You might assume that securing against these threats requires rigid, closed-world code across the board. In reality, while specific security behaviors – like tenancy boundaries and authentication checks – must have very strict, absolute definitions, the code that handles the data must remain flexible.

Paradoxically, flexible data handling makes those strict security boundaries much easier to enforce. Many common exploits arise from rigid-but-wrong boundaries: forcing a malicious payload into a complex, inflexible object hierarchy often leads to deserialization flaws, parser differentials, or edge-case bypasses. When you treat input simply as open data, you can rigorously validate the exact fields required for authorization and safely discard the rest, without your system implicitly trusting or coercing a malicious structure.

The Takeaway

At the outset of a project, the most valuable question you can ask yourself is simply: Am I building a closed world or an open one? If you are implementing a closed world – a mathematical model, a specific algorithm, a bounded physical simulation – the structure is fixed. Your greatest danger is subtle, silent error. The tools you need are constraints, explicit invariants, and the discipline to tightly couple your code to your theory.

If you are building an open world – a web service, an ingestion pipeline, a distributed system – your greatest dangers are complexity and brittleness. You exist to manage information that is incomplete and constantly evolving. The tools you need are flexibility, immutable data structures, and clear boundaries that isolate state.

The two types of software reward very different instincts: one requires the rigor of a mathematician, the other the adaptability of a systems engineer. By recognizing which world you’re operating in at any given moment, you stop fighting the nature of the problem, and your code naturally becomes both simpler and much more reliable.