What’s the Point of Theory in Biology?

The letters below were written by Noah Olsman, a scientist in the Paulsson laboratory at Harvard. They were not intended to be published, and many details could be sharpened. I’m publishing them here with Noah’s permission, in the hopes they’ll be a starting point for more discussion. Please email noah.olsman@gmail.com with feedback.

Letter #1

What role does theory play in the sciences? I think there’s a lot more nuance to this question than most people think, and the answer has changed in the last century.

There is a widespread, unstated assumption — often taught in textbooks — that theory is the end goal of science. Experiments are used to build up a theory, and a theory passes its test when it correctly predicts experimental results. Experimentation, in this frame, plays a subservient role to theory. This is useful pedagogically, but it is worth taking theory off that pedestal for a moment and treating it as just another tool for scientific reasoning.

From this perspective, theory’s role in science falls into three buckets: explanation, interpolation, and extrapolation. This is a simplification, but note that theory doesn’t really operate on data directly, but rather on models. One major failing of science curricula is the complete absence of serious discussion of what goes into creating mathematical models. In every course I’ve ever taken, models are sort of just handed to you. You see how models fit the data, but they are just sort of a set of equations or formalisms pulled from the aether. We often conflate modeling and theory, but really theory operates on a model. As an aside, I think this is a partial answer to your question about theory in biology: if your models aren’t good, theory doesn’t have much raw material to work with. Explanation means that theory can articulate or explain why a model behaves as it does by deriving its general properties (e.g., when is this system stable? are there conserved quantities?). Interpolation is the marriage of theory and modeling, allowing us to try to find parsimonious frameworks that connect many data points in some sort of unified way (e.g., Einstein unifying inertial and gravitational mass, Noether’s theorem linking conservation laws to symmetry). And extrapolation is the big payoff; theory making predictions about things that have not yet happened and do not exist in current data.

Extrapolation has long been used to justify theory, and theoretical ideas have produced massive conceptual breakthroughs. But today, a lot of theory exists in a bubble, with only marginal impact on the broader science and engineering ecosystem. In control theory, there is an oft-quoted statistic that >95% of all controllers in industry are PID controllers, despite the fact that PID is over 100 years old and there are many more sophisticated techniques out there now. This was always brought up as a sort of humorous self-deprecating anecdote, but it highlights the gap between the idealized version of theory and how the work of theorists plays out in the real world. Theory work stays afloat because there is enough useful practical output that we keep it going. I suspect this shift began in the mid-20th century, when computers started competing with theory at extrapolation: the atomic bomb was built from theory, but the hydrogen bomb was too complex and relied heavily on simulation. Von Neumann and his wife did the programming for this. It is the origin of the term “Monte Carlo simulation.” Before that, theory was self-justifying; it was the only tool available for making predictions, and because data was limited and expensive, models had to be simple enough for theoretical analysis anyway. As computers took over, people began to ask whether theoretical abstractions were intrinsically worthwhile when simulation could be more precise.

If we fast forward to today, I think machine learning changes the paradigm again. If we can collect massive amounts of data, and our best models of that data are giant, uninterpretable statistical models, where does theory fit in? You could argue that theory will still offer deeper insight into reality, but here is where I get to our world of biology, and I think the history of systems biology is a good case study.

When systems biology began as a field, the pitch was that we could finally put together data-driven models of biological processes, and that within these models we would uncover simple and universal design principles for biological systems. See Uri Alon’s work from the early 2000s. But look at what has happened over the last two decades in the field! This vision has mostly been put out to pasture. Many big names in the field from the 2000s and 2010s have pivoted to method development, and the rational actors realized that the nature of data in the field was just too coarse-grained to generate first-principles models, and therefore shaped their research efforts around solving those problems.

Up until recently, there was sort of an “if you build it, they will come” approach to data generation. If we generated enough data, then we could go back to the simple theoretical models and do what we imagined in the early days of the field. But now, the game has changed and the drive is towards statistical models. Even many of the most prominent theorists from the early days of systems biology have voted with their feet. Many top quantitative biology departments have moved away from theory in favor of methods-oriented research.

It’s time to re-evaluate the role of theory in science and engineering. Within biology, we need to do a lot of soul-searching about what theory is actually contributing to the field. I say this as someone who cares a lot about theory, and who structured his work around theory! I don’t think theory is dead in the water, but I think theorists need to think hard about what they are actually contributing to their respective fields. One positive example of this is control theorists starting the Learning for Dynamics and Control conference, whose sole purpose is to get control theorists in the same room as ML people to figure out how the two fields could interact productively.

Response from Niko

Noah, do you reckon there are things in biology we can only get from classical theory, and not from large statistical models?

One useful framework for thinking about this question might be the virtual cell. There are basically two approaches to build one; top-down or bottom-up.

The bottom-up or “mechanistic” approach, perhaps best embodied by Markus Covert at Stanford, aims to build a whole-cell model from equations and first principles. It feels more elegant than top-down efforts because it builds up an interpretable “knowledge base.” Basically, the model surfaces predictions, experiments reveal discrepancies, and each mismatch tells you where your understanding is wrong and which experiment might be useful to fill the gap. This is one way to grow knowledge.

It also seems like there are things we can uniquely learn from mechanistic models, and not from purely statistical ones. Markus Covert once told me the story about Neptune’s discovery, which came via discrepancies in a data set. Astronomers had a Newtonian model of the known planets, noticed that Uranus’s orbit didn’t behave as predicted, and figured out the perturbations were caused by gravity exerted by a then-unknown planet — Neptune. Covert’s point was that you could take the same data today and train a statistical model to predict planetary motion, but that model would be unlikely to take a conceptual leap of this nature and infer the existence of a missing planet. There are some discoveries that seem to require human reasoning, operating on a mechanistic framework — at least for now. Alvin Djajadikerta has written on similar ideas for Asimov Press. See his essay on “AI for Disruptive Science.”

Every time I sit down to write about theory in biology, and try to formulate my criticisms of statistical models, I worry readers will simply say, “Well, we can build sparse autoencoders to interpret the predictions of otherwise opaque models.” Adam Green at Markov.bio has written a lot about this. See “Through a Glass Darkly: Mechanistic Interpretability as the Bridge to End-to-End Biology.” And maybe they’re right. But then I think about Markus and the Neptune example, and I’m not sure interpretability tools actually solve the deeper problem.

My worry with purely data-driven models is that we lose something special that is inherent in science itself. If we’re only probing a model in the context of making useful predictions or finding a cure for some disease, will we actually be asking the right questions to understand the fundamental nature of a cell? Will we even know how to wield these sparse autoencoders, or which questions to ask of them, if we only ever approach this problem from the top-down?

Letter #2

I agree that one of the great things theory provides is a way to formalize beliefs about a system, make predictions, and then see when data does or doesn’t match those predictions. But in my mind, the problem is that we can only confidently identify those discrepancies in two cases.

First, we can use theory to try and prove impossibility results. For example, we could argue that any system containing XYZ interacting components can never achieve some behavior, such as oscillations or a certain level of noise. The advantage of this theory is that you don’t need a detailed model of the whole system, just knowledge about one particular subsystem.

This paper gives a fourth root scaling on how much feedback systems can suppress noise. So to reduce noise by a factor of 10, you need 10,000x the rate of signaling. The limitation of this theory is that it is never really constructive. At best, it tells you when a given system pushes up against a fundamental limit. If you do exceed the limit, then (assuming your theoretical results are correct), it means you are wrong about the constrained part of the system.

Second, we can try to explicitly model the system. This, of course, gives far more precise results, but the validity of those results is strongly tied to the correctness of both the structure of the model and its parameters. I think this is where we get into dangerous waters with bottom-up cell models. If you could explicitly model the whole cell, you would need to be extraordinarily confident in all of the hundreds or thousands of equations that go into the model before you rigorously analyze it to compare data to theory.

This is why I am, frankly, skeptical of the whole endeavor. Sure, you can do it, and maybe you can show your whole-cell model performs well in some mean-square error sense, but if you want to do anything else with the model, you either need to be really certain that the model is correct, or have a strong foundational understanding of which parameters most strongly affect a given prediction. This gets even more sketchy when the output you are observing is an indirect proxy of what you actually care about. The thing we use most commonly as an output today is RNA-seq, and I think we (the quantitative biology community) have invested a lot in the idea that, because we can do RNA-seq in a fast and cost-effective way, it is also able to give us reliable information about cell state. This is probably broadly true, but consider this thought experiment: imagine sometime in the future where we have perfected single-cell proteomics, and can get a readout of every protein’s state in the cell. How much would we trust RNA-seq as an assay when that exists? My guess is that transcriptomics would seem really coarse-grained and primitive in comparison. Right now, RNA-seq is the best we have, so we sort of have to trust it, but when the next thing comes along we will wonder why we ever put so much stock into it (as happened to microarrays, qPCR, and many other methods).

If we want to start actually modeling the cell, I think we need to start from simple processes and do the difficult work of really nailing down how accurately we can measure parameters, predict new data, perturb the experiments, and see if our predictions are still accurate. One counter-intuitive thing is that such an effort will actually push us away from purely mechanistic models, and towards more phenomenological ones.

I say this because even when we know the mechanisms of a system, if we work through what our reporters can actually tell us, we often find that certain parameter combinations are degenerate and can’t be uniquely identified from any experiment. As a simple example, if all you have is steady-state gene expression, you cannot infer production/degradation rates uniquely, as the steady state is a function of the ratio k_p/k_d. Rather than trying to just throw all of our biological knowledge into the model, we actually need to be thoughtful about how to simplify our model such that its parameters are identifiable from a given set of experiments. This sort of modeling, which sits between mechanism and whole-system simulation, sometimes gets a bad rap, since physicists tend to propose models and derive theoretical properties without taking the next step of mapping to data. The middle ground worth pursuing (and this is my bias, based on my work) is models that are both simple enough to be amenable to theory and rigorously justified by data.

Here’s a thought experiment I like. Imagine someone invited you on a brand new aircraft, and when you asked about safety they told you they had placed sensors measuring every component of the aircraft and fed it into a giant model, and the model said it was safe to fly. Would you get in? Probably not.

Now say they fed a mechanistic model of the whole plane into a supercomputer. Would you get in? My answer is still probably no, just because trusting that simulation requires an enormous amount of faith in the modeling assumptions. The reality is that we build up complex systems by validating models of various subsystems, integrating them, testing those integrations, etc. It isn’t some grand unified framework, but it is the pathway to making predictions that are reliable.

Maybe the state-of-the-art cell models have done this, but my sense is that the literature is still a lot more of a Frankensteinian assemblage of data and assumptions. It produces something, but not an airplane you’d board willingly. I think it is telling that, for all the success of scaling laws and big general models, the first safety-critical ML systems (self-driving cars) were developed incrementally. Maybe that will change, but I think we’re still gonna need old-school engineering to understand complex systems well enough to engineer them, be they cars or CAR-T cells.

Letter #3

I had a few follow-up thoughts that I figured I’d send along before they diffuse away. I was listening to a (very niche) podcast by a control theorist where he covers the history of one of the central results in the field, the Nyquist stability criterion. Without too much detail, this is a result that is taught in every intro control theory class. It was one of the first practical methods that allowed engineers to predict when a feedback system would be stable or unstable. While there were many 19th century results that allow you to prove stability of a given set of differential equations, they all relied on having a parameterized model of your system a priori. The beauty of Nyquist is that, while it can be used to prove the stability of a system given a model, it can also do so purely from experimental data. The basic idea is that you do a standardized experiment on an open-loop input/output system (say a vacuum tube amplifier), where you input sinusoids with a constant amplitude and increasing frequency. If the system is linear, the output will also be a sinusoid of the same frequency, but not necessarily with the same amplitude and phase. From this data, you can graph the relationship between the input signal’s frequency and the output signal’s amplitude and phase. This pair of plots is called a Bode plot. The question is, can you predict the stability of the closed-loop system purely from the open-loop characterization? What Nyquist realized is that you could derive a comprehensive theory based on fundamental mathematical properties of dynamical systems. This led to what is called the Nyquist stability criterion, which at a basic level says you can take those phase and magnitude plots, combine them into a trajectory in the complex plane, and count how many times that trajectory encircles the point −1. That count of encirclements maps directly onto whether or not the system is stable. If this discussion was too abstract, there is a little interactive tutorial built with Claude.

I bring this up because it’s a nice historical example of how theory actually enters a discipline. A field discovers some new phenomenon that is surprisingly useful (in this case feedback control of an electrical system), builds increasingly complex technology with it, and coasts on intuition until they run into problems that can’t be fixed by trial and error. This is when, if we are lucky, theory can jump in and play a big role. In Nyquist’s case, the telephone company was discovering instabilities in long-distance networks coming from their feedback amplifiers, but had no systematic way to fix them. Nyquist figured out how to use the existing experimental data to solve the core problem, and his work laid the foundation for control theory as a rigorous discipline. You could tell a structurally identical story about Shannon and information theory, or Maxwell and electromagnetism. If you haven’t read The Making of the Atomic Bomb, it is phenomenal and the first half of the book is all about this scientific history.

I’ll try not to go on too long, but I think there are three idiosyncratic but interesting juxtapositions to think about today, namely economics, machine learning, and biology. This triad forms a sort of goldilocks story of theory.

On one end, we have economics, where theory outpaced data for decades. The theory was beautiful and mathematically precise, but as data collection has improved, it turns out to be a pretty poor predictor, and the field has shifted to be more empirical.

At the other end of the spectrum, biology has remained staunchly empirical for decades, and resisted comprehensive theoretical treatment. This has left us with an explosion of experimental techniques, but general skepticism around the role of theory. It’s almost as if getting the theory right is so hard that we have largely decided it is easier to just hammer away on the experiments.

And then we have machine learning, where theory and experiments have entered this wildly virtuous flywheel, such that every marginal conceptual advancement immediately seems to produce practical improvements and drive new engineering work. Good researchers are worth astronomical sums of money just to keep the flywheel spinning. It’s probably at least somewhat a bubble, but the labor market tells us something about how productive researchers can be in the ideal circumstances.

Maybe biology will never hit that inflection point, but I suspect if it does it will have to go through the same sort of growing pains as these other success stories. By that, I mean we should acknowledge that the field will have to go through its trajectory. There probably isn’t a shortcut, but if we are aware of the template, then we can try to accelerate progress. Maybe this is well-trod ground in metascience, but I think there is something practical to learn from this kind of historical dissection of other fields’ success, in particular with an eye towards looking at what we can map onto fields that have not yet hit that point.

There needs to be some kind of serious intellectual investigation of what we are trying to accomplish in biology now. While there’s huge and well-earned excitement about the role AI will play in the future of biology, I worry we underestimate how much foundational work is still needed before those tools can deliver on their promise. It has been a bit sad seeing so many faculty staple AI onto their own work to appeal to funders, without much serious engagement with the bigger picture. I’m reminded of the opening line of Kurt Vonnegut’s God Bless You, Mr. Rosewater, which reads, “A sum of money is a leading character in this tale about people, just as a sum of honey might properly be a leading character in a tale about bees.”