Tag: biotech

  • We Need Biotech Data

    In 2011, while working in Brazil, Max Roser began formulating the idea for Our World in Data. He initially planned to publish “data and research on global change,” possibly as a book. Before long, that modest blueprint morphed into something far more ambitious.

    Our World in Data went live in May 2014 and, according to Roser, attracted an average of 20,000 visitors per month in its first six months. Today, the website has a worldwide audience. It’s difficult to get exact metrics, but they have more than 300,000 followers on Twitter alone. I’d argue that their true value, though, is not in their “audience reach,” but rather in their global impact.

    By publishing numbers and charts about global change on the internet, Our World in Data plays a key role in finding aspects of global development — like malaria cases over time, for example — that are particularly stubborn and, therefore, ripe for philanthropic or government interventions. In essence, they have shown how numbers, displayed in accessible forms, can illuminate which issues deserve urgent attention and where efforts can accelerate progress.

    We should build a similar initiative for biotechnology. The Schmidt Foundation has forecasted that the bioeconomy (encompassing everything from medicines to microbe-made materials) “could be a $30 trillion global industry.” If we intend to realize that potential, we first need to benchmark where biotechnology has been, assess where it stands now, and identify the most pressing challenges ahead.

    “If I’m just watching the news, I’m going to find it very difficult to get an all-things-considered sense of how humanity is doing,” researcher Fin Moorhouse has written. “I’d love to be able to visit a single site which shows me — in as close as possible to a single glance — some key overall indicators of how the world’s holding up.” Biotechnology deserves precisely this kind of concentrated, data-driven resource.

    More specifically, I’m imagining a website that aggregates information on everything from the computational costs of protein design to the efficiency of gene-editing tools across cell lines. Such a resource would help researchers, investors, and policymakers figure out which areas demand attention and which breakthroughs are worth scaling, all while helping prevent misuse.

    Pieces of this puzzle already exist, but it seems only in scattered or ad-hoc formats. Rob Carlson, managing director of Planetary Technologies, has famously publisheddata on DNA sequencing and synthesis costs. His charts became so popular that people eventually dubbed them “Carlson Curves.” Meanwhile, Epoch AI, a research institute that monitors the computational demands and scaling of AI models, is building the benchmarks and datasets needed to track the AI field’s progress. They could serve as a model for this biotechnology effort.

    A dedicated nonprofit research institute for “Biotech Data” could systematically track metrics such as:

    • Cloning times over the last several decades. How long does it take to synthesize DNA, stitch it together, and make sure everything works as intended? Bottlenecks in cloning slow scientific progress as a whole; the speed of experiments is a key driver of scientific speed overall.
    • CRISPR off-target scores over time. How frequently do gene-editing tools make unintended cuts in the genome, and how can we standardize measurements across studies? We’ll need to make some benchmarks.
    • Resolution and speed of cryo-EM. How rapidly have improvements in cryo-electron microscopy accelerated, both in terms of resolution and throughput?
    • Antibody manufacturing titers over time. Using a single antibody as reference, what titers are companies achieving in CHO (in g/L) or other cell types over time?
    • Bioscience PhDs awarded per year. How many new doctorates emerge from academia, and where do they end up across industry, startups, and research labs?

    Note that these datasets span both technical and societal issues. This is deliberate; to scale biotechnology, we have to understand both scientific breakthroughs and the workforce dynamics behind them. Tools are useless without a workforce to wield them. Many of these numbers already exist on the internet, but are buried in unwieldy government PDFs or tucked away in a patchwork of scientific articles. Others may require painstaking curation by combing through decades of research articles.

    Starting this nonprofit wouldn’t be too difficult. You could begin by collecting one dataset, transforming it into a chart, and posting it online. People on Twitter and LinkedIn seem to really love data visualizations, so you could probably grow an audience quickly. Over time, you might build automated scraping tools for government websites, create reusable templates to make charts quickly, and even publish short blog posts about various charts (like why, exactly, cryo-EM resolution got so good; what were the key innovations?)

    If this vision appeals to you, send me an email (niko@asimov.com), and I’ll help you get started. We briefly considered launching this venture at Asimov Press, but we only have two full-time employees and so don’t have the bandwidth. We might be keen to fund this project.

  • The Case for Bridge Editors

    Arc Institute researchers recently published a preprint showing that their gene-editing technology, called Bridge recombinases, work in human cells. Many people applauded the paper on social media, while others asked, “Wait, how does this tool even work? And why does it matter?”

    Fair questions. The preprint is not easy to understand, and the reasonsfor inventing a new type of gene-editing tool in 2025 are even less obvious. After all, there are already dozens of CRISPR gene-editing tools to swap ‘letters’ in the genome, delete stretches of DNA, or replace one sequence with another. What makes these recombinases any better?

    A few things. But if you don’t care to read on, and just want to hear my quick argument in 280 characters about 65 words, then here it is:

    Bridge recombinases can make large-scale changes to the human genome that other gene-editing tools cannot. Therefore, scientists can use them to answer basic research questions that they couldn’t before, like how certain chromosomal abnormalities cause cancer. Also, Bridge recombinases are able to make those big genome changes without relying on cellular repair mechanisms, which could make them more predictable than other gene-editing tools.

    So let me explain what a Bridge recombinase is. At its core, it’s a genome-editing tool made from two parts: a protein (called the recombinase) that cuts and rejoins strands of DNA, and a RNA molecule (called the ‘Bridge’) that guides the recombinase to a specific site in a cell’s genome.

    Bridge recombinases were discovered in nature; not made in a laboratory. They are a type of transposase, a gene that naturally “cuts-and-pastes” itself into new places in the genome. Transposases are found all over the place, like in plants, bacteria, and animals. Almost half of the human genome is thought to have originated from transposable elements, which get duplicated and move around over millions of years.

    Most transposase proteins recognize a specific stretch of DNA and always insert their transposable elements at that particular sequence. This means that it is very difficult to “reprogram” most transposase proteins.

    But last June, Arc Institute researchers described a family of naturally occurring transposases, called IS110, that rearrange DNA using an unusual mechanism. Unlike most transposases, which recognize DNA through protein binding, IS110 transposases use small RNA molecules (called “Bridge RNAs”) instead.

    In nature, these Bridge RNAs attach to two DNA sequences at the same time: one at the target location where the insertion occurs, and one within the transposon itself (the “donor DNA”). By bridging these two sequences, the Bridge RNA instructs the recombinase protein to cut and paste the transposon at the desired location. The Arc scientists showed that they could modify these Bridge RNAs to instruct the recombinase to edit other locations in the genome, too—not just the transposon’s original site. These scientists found two recombinases in this IS110 family (called IS621 and IS622) that can be used to edit large chunks of DNA in bacterial or human cells, respectively.

    Now, at this point you may be wondering: “OK, so that’s it? Bridge recombinases can edit genomes, but why not just use CRISPR-Cas tools to do that instead?” And the answer is this: The special thing about Bridge recombinases is that they edit the genome without relying on cellular repair mechanisms, unlike CRISPR-based tools.

    Since 2012, scientists have discovered all kinds of Cas proteins with various numbered names, like Cas9 and Cas12 and Cas13 and even (my favorite) Cas7-11. Researchers have also invented lots of CRISPR “spin-offs,” such as base editors and prime editors. Of all the CRISPR gene-editing tools available, prime editors are perhaps the most versatile. Prime editors can make lots of different types of edits to a genome, like swap one nucleotide for another (say, A → C), insert short sequences, or delete segments of DNA. These edits are usually short, though—typically 40–80 nucleotides, and rarely longer than 100 nucleotides.

    All CRISPR gene-editing tools share a flaw, though: they rely on cell repair pathways to make their edits. If a researcher wants to permanently “shut down” a gene, for example, they might use CRISPR-Cas9. The Cas9 protein goes into the genome and makes a cut at the position indicated by its guide RNA. But the Cas9 protein itself doesn’t then fix the DNA it has broken. The cell has to fix that damage a different way.

    Cells have two main ways to fix DNA breaks. Non-homologous end joining quickly slaps the two broken strands together, often adding or deleting random bits of DNA in the process. It is messy, but fast. The second option, homology-directed repair, uses a matching DNA template to fix the break. Basically, scientists can introduce a DNA “donor template” into cells alongside CRISPR-Cas9. The cell sees this template as the correct version of the sequence and copies from it to fix the break. But homology-directed repair only works reliably during specific phases of the cell cycle and happens less frequently than non-homologous end joining.

    Because CRISPR relies on these cellular repair pathways, its edits are inherently unpredictable. Cellular repair systems are non-deterministic (sometimes the cell uses one option, and sometimes it uses the other), and different cells therefore produce different results. Bridge recombinases bypass these cellular pathways, which could make their edits more predictable.

    Which brings me, finally, to the last question: “OK, so Bridge recombinases are perhaps more reliable than CRISPR tools, and they can make larger types of edits to the genome. But how can we actually use these things in the real-world?”

    In a few different ways. For their recent preprint, researchers used a Bridge recombinase to precisely invert a 930,000 base-pair sequence in human cells, and also to chop out 130,000 bases in a single go. They also used Bridge recombinases to edit a gene linked to a disease called Friedreich’s ataxia. Whereas healthy people have several repeats of a sequence—GAA—in a gene called FXN, people with Friedreich’s ataxia have hundreds or thousands of the repeats. This causes the gene to make a defective protein that, in turn, slowly causes nerve damage. In a cell culture model, the researchers used a Bridge recombinase to cut out more than 80 percent of the repeating sequences (and it worked about 40 percent of the time.)

    Now let’s zoom out and think about bigger applications for these Bridge recombinases. I can think, almost immediately, of two uses. The first is to study cancer in the laboratory, and the second is to quickly make transgenic mouse models for preclinical trials.

    There are many types of cancers caused by large-scale genome rearrangements. Chronic myeloid leukemia, for example, happens when chunks of chromosomes 9 and 22 swap places. Using Bridge recombinases, researchers could recreate this rearrangement in healthy cells to study how it causes disease and how to reverse it. Ditto for Ewing’s sarcoma, a bone cancer caused by another type of chromosome fusion.

    Bridge recombinases could also make it much simpler to make transgenic mice. Millions of mice are used in biology research each year. But historically, researchers would spend many months or years making just one transgenic mouse because the main technology to do—called Cre-loxP—is such an obnoxious pain to work with.

    Cre-loxP is a genetic tool that uses an enzyme, called Cre recombinase, to cut and rearrange DNA sequences located between specific DNA markers, called loxP sites. (In other words, Cre recombinase cuts DNA, but only at places in the genome containing these little DNA markers. Cre recombinases are therefore not programmable in the same way as a Bridge recombinase.)

    So to make a mouse model, scientists engineer one mouse line to have loxP sites flanking the gene of interest. Then, they engineer another mouse line to make the Cre recombinase. After breeding these two mouse lines together, the offspring inherit both the Cre protein and the loxP-flanked DNA. Only at that point can Cre recombinase finally rearrange or remove the DNA between the loxP sites. Each genetic change requires a new set of loxP insertions and many additional breeding cycles.

    With Bridge recombinases, much of this tediousness goes out the window. Instead of spending months making custom mouse lines with loxP sites, researchers can just design a single Bridge RNA and inject that RNA and the bridge recombinase protein, together, into embryos. The bridge RNA pairs with the DNA targets, and the recombinase rearranges the genome right there, in one step. No separate mouse lines, no extended breeding cycles, no pre-installed DNA sequences.

    There are other uses for Bridge recombinases, too. Scientists can use them to make just about any type of large-scale edits, which means that “genome design” is now an actual possibility. And so maybe the questions we started with—”How does this work? Why is it better?”—aren’t even the right ones.

    For decades, biologists have mainly been observers; cataloging genes, making little mutations; mapping chromosomes; knocking stuff out and trying to put piecemeal observations together again. But now, for the first time, there is a tool good enough to rewrite large stretches of the human genome. So the questions worth asking have changed. Instead of wondering what these tools can do, we should start thinking about what we want them to do.

    Thanks to Nicholas Perry and Matt Durrant for reading drafts of this.

    1 There is most likely a 2 bp flap that remains after recombination using the Bridge system. This can probably be mitigated depending on how scientists design the target and donor sequences. But in most cases it would require cellular processes to repair that 2 bp flap, according to an author of the preprint, Matt Durrant.

    2 There are more advanced forms of prime editing, but they are out of scope for this article. David Liu’s group has used prime editors to insert recombinase “landing sites” in a genome, and then use recombinases to make gene-sized inserts. But in general, prime editors alone are only able to modify ~100 bases of DNA at once.

    3 Prime editors get around these two major repair pathways, but still rely on a cell’s machinery (the mismatch repair pathway) to fix the damage and make the edit.

    4 Another benefit is that the DNA sequence Bridge recombinases insert is fully programmable, enabling nearly scarless genome edits. In contrast, prime editing paired with recombinases or CRISPR-associated transposases (CAST systems) rely on large recombinase recognition sequences—typically 30-50 nucleotides—that cannot be customized. These large, fixed recognition sequences make it difficult to do precise insertions, especially within things like exons.

  • Think of the Eggs

    When people think of “biotech” — myself included — they tend to picture GLP-1s and gene therapies. But biotech is much broader than just medicine; it’s also pushing forward a renaissance in the egg industry.

    Eggs aren’t usually top of mind for me. I toss a carton in my grocery cart now and then, but rarely think about how those eggs landed on the shelf in the first place. Perhaps I should. Every year, the global egg industry kills around six billion male chicks shortly after they hatch. Why? Because male birds, bred from “layer” lines, don’t make eggs and don’t pack on enough meat to be profitable. Hence, they’re thrown into a blender.

    Fortunately, scientists have figured out how to determine a chicken’s sex before it hatches. These technologies are called in ovo sexing. Using hyperspectral cameras or PCR, they can be used to figure out which eggs will hatch male vs. female. With widespread adoption, in ovo sexing could spare billions of chicks from the blender. Alas, these technologies weren’t available at all in the U.S. … until last month. Hardly anyone in the mainstream biotech community seems to know about what’s going on in this sector but, in my view, it’s among the most underrated and important stories of today.

    In ovo sexing has been available in Europe for years. Germany banned chick culling in 2022. In response, hatcheries were initially forced to keep male chicks alive and raise them for meat — “a practice that was costly and unsustainable,” according to Innovate Animal Ag. (Again, so-called “layer” chickens just don’t produce much meat. Broilerchickens, on the other hand, are specially bred to grow quickly; they “can grow to be over four times the weight of a natural chicken in only 6-7 weeks,” according to an article in Asimov Press.)

    Sensing an opportunity, companies launched in ovo sexing technologies in Europe so hatcheries could screen out male eggs before they hatched. If eggs are destroyed by day 12 of development, the embryo feels no pain. Thanks to this shift, about 78.4 million of Europe’s 389 million hens — or about 20 percent — came from in ovo sexed eggs last year, according to data from Innovate Animal Ag.

    But only two in ovo sexing methods have reached commercial scale so far. As Robert Yaman, CEO of Innovate Animal Ag, previously wrote for Asimov Press:

    The first of these approaches utilizes imaging technologies like MRI or hyperspectral imaging to look “through” the shell of the egg to determine the sex of the embryo inside. The second approach involves taking a small fluid sample from inside the egg, and then running PCR to identify the sex chromosomes, or using mass spectrometry to locate a sex-specific hormone…

    …Other approaches are in development and have not yet been commercially deployed. Some technologies can “smell” a chick’s sex by analyzing volatile compounds excreted through the eggshell. Another approach uses gene editing so that male eggs have a genetic marker that allows their development to be halted by a simple trigger, such as a blue light. Unlike humans, the sex of a chicken is determined by the chromosomal contribution of its mother. By only modifying the sex chromosome of the female parent line that yields male chicks, the female chicks end up without the gene edit. This means that the eggs they lay do not need to be labeled as “gene-edited” for consumers.

    As Europe rolls out these technologies, most American consumers still have no idea that chick culling is even a thing. In one poll, only 11 percent of Americans knew about chick culling; once informed, a majority opposed it. Fortunately, in ovo sexing technologies have finally arrived in the U.S.

    Three U.S. egg companies — Egg Innovations, Kipster, and NestFresh — have announced plans to adopt in ovo sexing technology. In late 2024, Agri-Advanced Technologies also rolled out a machine called “Cheggy” to hatcheries in Iowa and Texas. Cheggy can scan 25,000 eggs per hour and figure out the sex of embryos inside using hyperspectral imaging. The machine is able to “see” the color of down feathers forming beneath the shell. (Brown-egg chicken breeds typically have differently colored feathers for males and females, but this doesn’t work on white eggs.) Hyperspectral imaging is great because it’s non-invasive; the eggs don’t need to be cracked or poked at all. If the machine detects a female embryo, it sends it back to the incubator. Male eggs are destroyed and turned into protein for pet food.

    Also, in December, Respeggt announced that by February 2025, it will roll out its own in ovo sexing tech at a massive Nebraska hatchery, with a capacity to serve 10 percent of the entire U.S. layer market. Respeggt’s technology relies on PCR, so it works for both white and brown eggs.

     *Respeggt’s technology uses a laser to puncture eggs and retrieve a small amount of liquid to run PCR.*

    In Europe, in-ovo-sexed eggs cost only about one to three euro cents more each. That’s a tiny bump, and I’d gladly pay extra just for the mental solitude of knowing that farmers didn’t have to kill any male chicks to produce them. But I am not most consumers; eggs are one of the most price-sensitive grocery items. When people talk about inflation, they usually talk about the price of bread, milk, and eggs!

    Fortunately, a Nielsen survey found that 71 percent of American egg buyers say they’d pay more for in-ovo-sexed eggs. We’ll see what happens, though, as these eggs get rolled out to grocery stores (likely by mid-2025). Consumer reactions will be super important here because the U.S. government doesn’t mandate whether or not hatcheries kill baby chicks. The survival of these technologies will literally be determined by whether or not people buy the eggs.

    Finally, I just want to say that few (if any) people have been pushing for this harder than Innovate Animal Ag. They didn’t pay me to say that, either; they don’t even know I’m writing this article! But they’re the ones dropping all these reports and data about chick culling, commissioning surveys to figure out price points, and pushing for new certifications to coax consumer buy-in.

    So yeah, we often celebrate biotech’s potential — gene editing, advanced vaccines, cultivated meat — but in ovo sexing is already improving the egg industry at scale. It flies under the radar, but at least now you know the story.

  • Microbial Lenses

    There’s a new paper out in PNAS that hints at some intriguing synthetic biology applications. Researchers at the University of Rochester introduced a sea sponge gene into Escherichia coli, giving the bacteria a translucent, silica-based coating. This biosilica shell transforms the cells into tiny microlenses that focus beams of light.

    Here’s an excerpt from the paper (paywalled):

    Remarkably, the polysilicate-encapsulated bacteria focus light into intense nanojets that shine nearly an order of magnitude brighter than unmodified bacteria. Polysilicate-encapsulated bacteria remain metabolically active for up to four months, potentially enabling them to sense and respond to stimuli over time. Our data show that synthetic biology can produce inexpensive and durable photonic components with unique optical properties.

    Typically, microlenses are just tiny spheres, a few micrometers across, fabricated in cleanrooms with harsh chemicals. They appear in photodetectors and camera sensor arrays. Engineered microbes can’t match the precision of these fabricated microlenses, but they offer a major advantage: you can make them at room temperature and neutral pH in a flask of liquid. (And the cells reproduce themselves “for free”!)

    Notably, lifeforms evolved primitive microlenses long before this paper. Cyanobacteria focus incoming light on their cell membranes to locate the sun’s position; they’re probably the world’s smallest and oldest camera eyes. Other cells, like yeast and red blood cells, also naturally behave as microlenses.

    What’s new about this paper is that the silica coating majorly improves the cells’ ability to focus light. More importantly, the work shows that we can tune a living organism’s optical properties through genetic engineering.

    The researchers took silicatein, an enzyme from sea sponges, and fused it to OmpA, an outer-membrane protein that allows molecules to flow in and out of the cell. Silicatein grabs silicon-containing molecules from the environment and stitches them into silica polymers; sea sponges use it to build “bioglass” structures. When fused, OmpA embeds into the cell membrane and holds silicatein outward, like a fishing hook.

    By flooding the engineered cells with orthosilicate (a silicon-containing molecule), the silicatein “hooks” grab it and stitch together a silica shell around the entire cell. The researchers confirmed this with confocal imaging and a dye that binds specifically to silica. The engineered cells ended up surrounded by dye, while normal cells remained unstained.

     *Rho123, a dye, stains silica. Cells were engineered to express silicatein enzyme from two different microbes (hence column A and B), and were compared to wildtype. From Sidor et al.*

    This silica shell significantly changes the cells’ optical properties. To visualize this, the researchers built a custom microscope that can shine light on cells from any imaginable angle relative to the vertical axis. Uncoated cells scattered some light but didn’t create a distinct focal spot beyond their surface. In contrast, silica-encapsulated microbes produced light beams that stretched for several microns, with peak intensities nearly an order of magnitude higher than wildtype cells.

    I would have guessed this treatment might kill the cells — either because the silica shell blocks nutrients or because photons would roast them — but it doesn’t. Engineered cells continued scattering and focusing light even months after switching on the fusion protein. The only downside is that the cells grow more slowly, if at all.

    What could we do with these living lenses?

    My first step would be to engineer cells of different shapes and dimensions. A typical E. coli measures about two microns long and one micron wide. What if we engineered more spherical cells? Or longer cells? We could create a series of living microlenses, each with unique optical properties, by tuning the silicatein protein and adjusting the cells’ physical dimensions.

    (In the video below, researchers are blasting a stationary cell with light at angles ranging from -90° to 90°. There are some orientations where a nanojet appears, but it happens quickly.)

    From there, the applications depend on our imaginations. We might wire living bacteria into optical devices that don’t need batteries and last for months without a power supply. Or we could build medical devices. Instead of swallowing a pill camera powered by toxic batteries, perhaps we could engineer E. coli into a camera. I’m not sure. At this stage, it’s speculation.

    Practical limitations exist with current microlenses. As pixel sizes in camera sensor arrays shrink below two micrometers, placing microlenses becomes difficult. However, cells can “swim” to a specific destination and arrange themselves autonomously. In other words, arrays of bacteria could line up over a sensor — maybe using microfluidic channels — to focus and direct light into tiny pixels.

    Will any of these ideas actually happen? Probably not soon. Still, when a paper broadens our “design space” in biological engineering, it’s worth paying attention. One of my first questions, upon reading something like this, is usually: “Where else could this be applied, especially in unexpected ways?”

    Consider optogenetics: Ed Boyden and Karl Deisseroth discovered channelrhodopsins—light-responsive proteins—and imagined splicing them into neurons to control action potentials. That mental leap doesn’t seem so large in hindsight.

    Engineered gas vesicles, similarly, are being used to improve ultrasound resolution within the body, enabling scientists to image individual cells moving through the bloodstream. I’ve written about these structures before for Asimov Press. Mikhail Shapiro got the idea for engineering gas vesicles after reading “two short paragraphs” about photosynthetic algae!

    In other words, pay attention when a paper like this appears. It might plant the seeds for something exciting, even if we don’t recognize it immediately.

  • How to Minimize Cell Burden

    I. Molecular Burden

    Biochemistry textbooks often depict cells as spacious places, where molecules float in secluded harmony. But cells are dense and crowded; a bit like molecular burritos, according to Michael Elowitz, a biologist at Caltech.

    Roughly three to four million proteins jostle around inside a single E. coli bacterium, which has an internal volume 50 billion times smaller than a drop of water. A typical enzyme within this crowded cell collides with its substrate 500,000 times each second. When bioengineers manipulate life, they must also consider how their modifications will impact everything else within the cell, too—for everything in the cell is connected to everything else.

    In 2000, Elowitz published one of the first synthetic gene circuits—called the “repressilator”—with his mentor, Stanislas Leibler. A gene circuit is made from RNA or proteins that interact with one another, enabling cells to perform logical functions. The repressilator was crafted from just three genes, each of which encoded a protein that repressed another protein to form an inhibitory loop. One of these proteins was fused to a green fluorescent protein so that, as the protein levels rose and fell, the cells flashed green—on and off—in 150 minute intervals.

    As synthetic biology advanced, and its tools grew sharper, synthetic gene circuits swelled in size. In 2016, a paper in Science reported an engineered circuit made from 55 different sequences assembled into 11 genes; among the largest gene circuits yet assembled in a single cell. Building significantly larger synthetic gene circuits will require careful consideration of the finite resources available to cells.

    After all, cells are not empty vessels that have evolved to do our bidding. When we engineer an organism, coaxing it to make new proteins or molecules, we are imposing a molecular burden upon it. Typically, burdensome genes are defined as those that “impose a high enough energetic burden to be opposed by selection if they do not confer sufficient added benefits.” Any genes added to a cell must compete for cellular resources—energy, ribosomes, and RNA polymerases—that may diminish the cell’s ability to carry out other functions; to grow, metabolize, and divide.

    recent study in Nature Communications measured the molecular burden imposed by 301 different plasmids; fewer than 20 percent of them caused E. coli cells to grow more slowly. But surprisingly, some of the most burdensome plasmids were also the simplest—a plasmid encoding red fluorescent protein—and nothing more—caused a 44% reduction in growth rate.

    The study is intriguing, in part, because its dataset could provide insights into whysome genes, once expressed, cause cells to grow more slowly. More importantly, though, this study reveals that there is still so much we don’t understand about biology, or toxicity, or how to ease molecular loads as we strive to engineer life in increasingly sophisticated ways.

    II. Competition

    Cells have finite resources. Insert a synthetic gene into a cell, and several things quickly happen.

    First, the gene is transcribed into RNA by an enzyme calledRNA polymerase. Then, the RNA molecules are translated into protein via ribosomes, large protein-RNA complexes made from dozens of interlocking components. A typical E. coli cell contains about 3,000 RNA polymerase molecules and 30,000 ribosomes. Exogenous genes pull some of these enzymes away from other parts of the cell. And, for reasons that are not fully understood, cells burdened with recombinant DNA do not upregulate their production of RNA polymerase or ribosomes to compensate for the increased load, according to a 2020 study.

    Although the term “burden” typically refers to resource limitations—be they metabolic, transcriptional, or translational—it is often experimentally difficult to untangle from toxicity. A thorough investigation is often needed to tell whether a cell is growing slowly due to burden or toxicity, because the outcome—slow growth—is the same.

    Some proteins that are normally non-toxic also become toxic when expressed above a certain threshold. For a 2018 study, researchers expressed 29 different enzymes in yeast. All of the enzymes have well-known mechanisms and are non-toxic at normal levels. Some of the enzymes became toxic in the yeast, however, because they “aggregated together, they overloaded a transport system that [took] them to a specific cell compartment, or [they] produced too much catalytic activity.”

    A cell faced with excess burden or toxicity really only has one way out: To mutate and break the troublesome genes. A single milliliter of liquid culture holds as many as one billion E. coli cells. If just one of those cells mutates the burdensome genes and breaks its function, then that cell will grow more quickly than its neighbors. The mutated cell’s progeny will eventually take over the entire population. The more burdensome a genetic sequence, the more likely a mutant will appear and take over.

    Remember that Nature Communications study that I mentioned earlier? Well, the authors built a simple mathematical model to predict the correlation between different levels of burden and “population takeovers” when cells are grown in different sized containers. A plasmid causing more than a 30% reduction in growth rate, for example, is likely to result in a “mutant takeover” when the cells are grown in even a small container, such as a flask.

    Collecting the data to build this model was straightforward. The authors placed each of the 301 different plasmids into E. coli cells, and then measured how much each plasmid slowed down their growth rates. A plate reader machine measured the cloudiness of each population over time; a proxy for cell growth. The authors also measured growth rates for E. coli carrying one of five different plasmids that imposed known levels of burden. These controls were used to normalize growth rates between experiments.

    Of the 301 plasmids tested, just six caused cells to grow more than 30% slower than unaltered cells. A further 19 plasmids caused cells to grow more than 20% slower. In total, the authors found 59 plasmids that caused measurable changes to bacterial growth rates.

    Genes expressed from constitutive promoters (meaning they are always “on”) were 2.9 times more likely to be in the burdensome set of 59 plasmids. And plasmids containing a strong ribosome binding site (the part of an mRNA strand where ribosomes bind and kickstart translation) were 2.1 times as likely to slow E. coli growth, compared to plasmids that include weaker RBS variants.

    III. Build Bigger

    If this study’s results were distilled in a single sentence, I think it would be this:

    Genetic sequences inserted into a cell do not usually cause excess burden; but when they do, it is often for reasons we don’t fully understand.

    Why, for example, is a plasmid encoding red fluorescent protein so burdensome? Plasmids encoding YFP and GFP also caused 29.5% and 27.1% reductions in growth rate, respectively. A plasmid encoding a chloramphenicol antibiotic resistance gene—and nothing more—caused cells to grow 33.4% slower. Molecular mechanisms explaining these growth defects are often unclear, or completely absent.

    At Asimov, one of our primary applications involves engineering Chinese Hamster Ovary (CHO) cells to make therapeutic proteins, like monoclonal antibodies. This particular type of cell, originally derived from animals smuggled out of China in 1948, are used to make nearly 90% of all therapeutic proteins.

    In our hands, most therapeutic antibodies can be expressed well by optimizing the genetic design or bioreactor process. In many cases, we’ve engineered CHO cells to make more than 10 grams per liter of antibodies without causing any noticeable growth defects on the cells. But other times—and for reasons we don’t fully understand—engineering CHO cells to make certain therapeutic antibodies imposes huge burdens or toxicity. Debugging these cases is an interesting exercise on its own. The root cause is often mysterious, but in other cases we can detect hallmarks of endoplasmic reticulum (ER) stress, which suggests protein misfolding or aggregation in the cell.

    Fortunately, there are steps we can take to reduce molecular burden or toxicity.

    Codon optimization is one option. This is when scientists convert the DNA sequence from one organism into codons “preferred” by another organism, without altering the order of amino acids in the final protein. In the lab, we have tested various codon configurations to find those that slow down the ribosome’s movement, thus giving proteins more time to fold and reducing toxicity.

    Another way we solve this problem is by balancing the expression of genes. Antibodies are made from proteins—called heavy chain and light chain—that come together to make a Y-shaped molecule. If one of these chains is expressed at a much lower level than the other, it can become rate-limiting in the formation of the antibodies. At the same time, if the “excess” chain is the antibody heavy chain, it can float around the cell and cause toxicity. Another way to reduce burden is to integrate genes directly into the host genome, rather than using multi-copy plasmids, such that only one copy of the genes exist and they don’t consume too many cellular resources.

    A more complicated approach is to engineer cells with incoherent feedforward loops, or IFFLs, to mitigate burden caused by gene expression. Such gene circuits are designed to dampen mRNA levels when a gene’s expression diminishes the cell’s ability to carry out other functions.

    A balance must be struck, however. It is good to reduce burden, but not at the cost of antibody production. Molecular burden, toxicity, and economics are all valid things to consider.

    Most of these strategies are also akin to using Tylenol to treat a cold—we may get the outcome we’re after (less burden), but only because we don’t understand how to solve the problem at its core. It is only by peering deeper into living cells, and untangling their intricate complexities, that we begin to understand what goes wrong when we manipulate them.

    In this case, as in many others, greater basic science research will enable more sophisticated engineering.


    By Niko McCarty

    Thanks to Rachel Kelemen, Alec Nielsen, Ben Gordon, Kate Dray, Kevin Smith, Chris Voigt, and Arturo Casini for help with this essay.

  • What I Learned in 2024

    Asimov Press, the publishing company that I run with Xander Balwit, has only been around for about a year. I really love the job because it lets me work with all kinds of writers and scientists who have compelling ideas. I get to hang out with them, help shape their ideas from “nebulous thoughts” into more crystalline prose, and then share those ideas with the world. It’s a dream job.

    I was reflecting on some of the things I’ve learned over the last year, and decided to wrap everything together into a list (as one does during the holidays.) This is remarkably overdone and derivative, but I’ve done my best to choose interesting things that I haven’t seen covered elsewhere. Maybe you’ll get a kick out of this. Note that these are not necessarily things that happened this year; they are just things that I learned this year.

    Micropipette Origins

    In 1957, a 32-year-old German postdoctoral researcher named Heinrich Schnitger invented the micropipette in just three days. He did it, apparently, because he was using mouth pipettes to work with toxic molecules and hated his day-to-day work. Eppendorf, a German company, licensed his invention almost immediately and commercialized it in the 1960s. Schnitger drowned in a Bavarian mountain lake in 1964.

    Schnitger was not the first person to make a micropipette, though; his was simply the first model to catch on. (“…his design had ‘all the essential features of the modern pipette,’ according to a close witness of the invention, including a spring-loaded piston, a second spring to shoot out residual liquid, and a plastic tip.”) Two Americans, James W. Brown and Robert L. Weintraub, filed a patent for an adjustable pipette with a removable tip in 1953. Their device dispensed tiny drops of liquid via the spinning of a wheel on one end, and it could be used with one hand — but modern micropipettes do not use wheels to dispense liquids! (Source)

    Patents on Living Things

    General Electric was performing research on engineered Pseudomonas microbes to clean up oil spills in the 1970s. They filed the first patent on a genetically modified organism, and the case ultimately went all the way to the Supreme Court, which decided the case in 1980. Before then, patent clerks rejected any applications for living organisms because of a doctrine dating back to 1889, called Ex parte Latimer, which dealt with “the patentability of a fiber extracted from needles of a pine tree.” As Rhea Purohit writes for Asimov Press:

    …the U.S. Patent Office rejected [General Electric’s] application because the subject matter was a ‘natural product.’ Benton J. Hall, a lawyer-politician from Iowa and Commissioner of Patents at the time, opined that the composition of trees was ‘not a patentable invention, recognized by statute, any more than to find a new gem or jewel in the earth would entitle the discoverer to patent all gems which should be subsequently found.’

    After publishing this story, Rich Pell — founder of the Center for PostNatural History — told me that Louis Pasteur was actually the first person to patent a living organism. In 1873, Pasteur petitioned for a patent called “Improvement in the Manufacture of Beer and Yeast,” outlining methods to kill bacteria by heating beer and also disclosing his Brewer’s yeast strain. (Source)

    Single Cells Anticipate Seasons

    The best paper I read all year is “Bacteria can anticipate the seasons: Photoperiodism in cyanobacteria,” published in Science in September. It is a masterpiece. Or, as I wrote in an article, “In just six wonderfully lucid pages, researchers from Vanderbilt University in Nashville show that cyanobacteria can ‘sense’ shortening days and change the molecular compositions of their cell membranes to prepare for cold weather.”

    I continued to explain the experiment, writing:

    Cyanobacterial cells were divided into three groups. Each group grew at the same temperature — a steady 30°C. But each group was exposed to a different amount of light each day. One group was exposed to 16 hours of lightness and 8 hours of darkness each day; another to 12 hours of light and 12 hours of darkness, or ‘equinox’; and the third to 8 hours of light and 16 hours of dark.

    After eight days, Jabbur dunked each group of cells into ice-cold water and measured how many lived through the ordeal. Cells exposed to less light (8 hours light, 16 hours dark) were two-to three-times more likely to survive compared to the other two groups. The effect was also linearly correlated. Cells exposed to 20 hours of darkness per day were more likely to survive the cold water compared to cells exposed to 18 hours, and so on.

    I’m still blown away by the sheer simplicity and elegance of this paper. You should read it! (Source)

    Plague Deaths

    Nobody knows how many people died during the Black Death. This “simple” statement was something I hadn’t really considered prior to this year. But, as Saloni Dattani has written:

    Direct records of mortality are sparse and mostly relate to deaths among the nobility. Researchers have compiled information from tax and rent registers, parish records, court documents, guild records, and archaeological remains from many localities across Europe. However, even those who have carefully combed over this data have not reached a consensus about the overall death toll.

    For example, in 2005, statistician George Christakos and his colleagues compiled data from over a hundred European cities. Using their data, the economists Jedwab, Johnson, and Koyama estimated in 2019 that 38.75 percent of Western Europe’s population had died on average. In contrast, the historians John Aberth (2021) and Ole Benedictow (2021) have estimated that 51–58 percent or upwards of 60 percent of Europe’s population died, respectively.

    Even today, many countries do not have formal institutions to tabulate deaths. In many cases, we simply don’t know how many people die from various diseases. Dattani continues:

    Since cause-of-death registries have been limited or dysfunctional in many countries in Africa and South Asia, some researchers have conducted national ‘verbal autopsies’ to fill the gap. In these studies, millions of families were interviewed about recently deceased relatives and their diseases and symptoms before death. Doctors then used their answers to estimate their cause of death.

    The results suggest that we had greatly underestimated the death toll of diseases such as tuberculosis and venomous snakebites. Revised international estimates suggest that they kill over 1 million and 100,000 people, respectively, each year. (Source)

    Mendel Mouse Hoax

    Gregor Mendel, the Augustinian friar who founded genetics, worked with garden peas. He meticulously crossed his peas and tabulated the “phenotypes” that appeared to unravel the laws of inheritance. But nobody knows, even today, why exactlyhe decided to do these experiments. What was his inspiration?

    In the absence of historical certainty, many writers and scholars have felt free to speculate. For example, Robin Henig, author of the book The Monk in the Garden, wrote that:

    [Mendel] kept [mice] in cages in his two-room flat, where they gave off a distinctive stench of cedar chips, fur, and rodent droppings. He was trying to breed wild-type mice with albinos to see what color coats the hybrids would have. [The bishop] seemed to find it inappropriate, and perhaps titillating, for a priest who had taken vows of chastity and celibacy to be encouraging — and watching — rodent sex.

    After the bishop banned mice from the monastery, Henig claims, Mendel took to garden peas instead. A similar tale has appeared in many academic and news articles (including in Asimov Press), but it’s likely apocryphal.

    Daniel J. Fairbanks, a Mendel scholar in Utah, says in his own book that there is no evidence for it. Although Mendel published work with insect pests, and even became a renowned beekeeper late in life, banning mice would have been peculiar because the monastery’s abbot regularly bred sheep and other agricultural animals.

    Synthetic Biology’s Discouraging Start

    The field of synthetic biology began, “officially,” in the year 2000 when two papers — published back-to-back in the journal Nature — reported the first synthetic gene circuits; assemblies of DNA that “programmed” living cells to act in desired ways. These early synthetic gene circuits (called the repressilator and toggle switch) suggested that engineers could recreate some of the complex networks within living cells and then manipulate them to carry out entirely new functions. In other words, they could “program biology.”

    The repressilator was made by Michael Elowitz and Stanislas Leibler, two physicists at Princeton University. I interviewed Elowitz earlier this year, and was surprised when he told me about some of his early doubts surrounding the project:

    I definitely had no idea whether it was going to work. When I asked people what they thought of the project, which I did incessantly, I got very different answers. A few well-known biologists would say, ‘No, it’ll never work that way. It just won’t work.’

    And I’d ask them, ‘Why won’t it work?’ And they’d say, ‘Biology just doesn’t really work that way. You can’t predict what’s going to happen.’ Other people thought it sounded fun. So it was a mix of positive and negative feedback. It’s funny to think about that in hindsight. At the time, I was really excited about the project. I told lots of people about it, but then I’d swear them to secrecy. It was all very silly. (Source)

    No More Dead Chicks

    In ovo sexing is one of the most exciting technologies that I had never heard of. The gist is that we can now figure out the sex of a baby chick while it is still inside the egg; before it hatches. This enables farmers to discard chicks before they are born and, thus, before they can feel pain. That’s a huge deal because something like 6 to 7 billion one-day-old male chicks are killed each year. Egg farms kill male chicks because, well, they don’t lay eggs. So instead, they put them on a conveyer belt and drop them into a macerator that rips into their flesh. It’s absolutely brutal. You can find videos online, if so inclined.

    But this is a cause area that biotechnology can make a huge impact on. And there is good news. The number of male chicks killed in European egg farms has fallen by about 20 percent in recent years. In ovo sexing is now used in about 20 percent of the European market. And this technology is — for the first time — making its way to the United States. A few weeks ago, “a US hatchery shared that it has installed the nation’s first in-ovo sexing system.” (Source)

    Making Eggs Without Ovaries

    In just a few years’ time, scientists may figure out how to make viable eggs (or even sperm) directly from stem cells. The technology is called in vitro oogenesis, and Metacelsus published a deep explainer on it earlier this year:

    Such an approach would take cells from an adult — such as skin or blood — and reprogram them into induced pluripotent stem cells, or iPSCs. Much like embryonic stem cells, iPSCs have the ability to form any cell in the adult body; eggs included. Although generating human iPSCs is now routine, coaxing iPSCs to form eggs in a process known as in vitro oogenesis has only been successful on cells taken from mice.

    If this technology pans out, it will likely cost (initially) between $150,000-$250,000 dollars, just to make the actual eggs (so not including implanting those eggs and so on). It will:

    …expand the kinds of people who are able to have biological children. First, growing eggs from ovarian biopsy samples will allow women to obtain eggs even when their ovarian reserve is diminished. This could extend the age of fertility into the mid-40s. Furthermore, this technology would allow younger women to grow large numbers of eggs from tissue samples. By enabling women to freeze more eggs, they would have a better chance of having babies later. (Source)

  • Why Engineer Biology?

    This essay originally appeared on the Asimov blog.

    Many complex problems are caused by molecular imbalances. Type I diabetes is caused by a lack of insulin; obesity in part by a nutrient excess. Climate change is caused by an overabundance of certain gasses in the atmosphere. There is too much plastic in landfills, and the molecules break down too slowly.

    Some of the world’s most pressing problems—fertility rates, a scarcity of food and medicines in poor regions of the world, warming climates, stagnant health spans in the West—play out in the world of atoms. Many of these problems can be solved, or at least progress can be made toward solving them, by engineering biology.

    Why Cells

    Cells have two features that make them well-suited to atom-level problem-solving. First, they are a form of advanced nanotechnology that can be exploited using tools from molecular biology.

    Lifeforms harvest atoms from their local environments and rearrange them into astonishingly complex nanomotors and materials. A tree strips carbon dioxide from the air, breaks the molecules apart, and creates sugar. From water and waste, plant cells make woods, grains, pigments and medicines. A single seed contains all the “instructions” necessary to grow into a towering Redwood tree, simply by collecting atoms from the dirt and air to build roots, branches and leaves. Life resembles alchemy, but its mechanisms are rooted in chemistry and physics.

    Second, cells divide. One cell becomes two, then four, then eight, and so on as long as there is ready access to carbon, nitrogen, hydrogen, phosphorus, and a handful of other atoms. A single E. coli bacterium, dividing every 30 minutes, will form a colony of more than 1 million cells in about eight hours.

    This is different from man-made machines, of course. A mechanical engineer who builds a robot must invest similar effort (or slightly less, once a prototype is available) to make a second robot. But not in biology; when a scientist engineers a cell, her manipulations will propagate, divide, and spread without any prodding or instructions. An engineered plant, designed to capture more carbon dioxide from the atmosphere, need only be made once, for its seeds can be planted to grow an entire forest.

    Track Record

    Polymath Benoît Mandelbrot described the Lindy Effect in 1982, explaining that there is a statistical tendency for things with long pasts to persist longer into the future. A book that has been in print for 300 years is more likely to be around in another 300 years, compared to a book that has only been in print for three years.

    The Lindy effect also applies to biotechnology, which has a long track record of solving difficult problems in food, medicine, and climate.

    In 1944, Mexican farmers rarely grew wheat because much of their crop was devastated by a disease called stem rust. In 1945, an American agronomist named Norman Borlaug moved to Mexico and, with a small team, crossed thousands of wheat strains to find a variant that could resist the disease. His efforts boosted Mexico’s wheat yields six-fold between 1944 and 1963. Mexico became a net-exporter of wheat. One seed, propagated many times over, fed an entire country.

    Borlaug and his team achieved this feat using tools that would be considered primitive by today’s standards. Their wheat crosses were done in the absence of DNA sequences, and the scientists had little understanding of the molecular mechanisms linking wheats’ genotypes and phenotypes. In the last 60 years, new tools to engineer plants have been used to roughly triple global crop yields. It’s now possible to feed 10 billion people on existing farmland.

    As molecular biology tools grow in precision, they are being applied to ever more difficult problems in human health. Since Genentech’s rise in the late 1970s, scientists have invented capable tools to quickly synthesize DNA and insert the molecules into cells, coaxing them to make medicines or other useful molecules.

    The first approved malaria vaccine, called RTS,S, is made in precisely this way. A genetic sequence encoding a part of the malaria parasite’s circumsporozoite protein is inserted into living cells, which divide and then produce the molecule. These newly approved malaria vaccines are 75% effective at preventing infections in children, drastically reducing deaths caused by a disease that has killed billions of humans over centuries.

    Increasingly, engineered biology is being used to not only make molecules in cells grown outside the body but to directly modify cells that go inside the body. The F.D.A. recently approved the first clinical therapy that uses CRISPR/Cas9 gene-editing, called Casgevy. It’s a treatment for sickle cell disease, a painful condition caused by a genetic change in the beta-globin gene that encodes part of the hemoglobin blood protein.

    Casgevy works like this: Stem cells are collected from the patient, edited using CRISPR-Cas9 to coax them into making a fetal form of hemoglobin, and reinfused back into the body. The edited stem cells settle in the bone marrow and make a healthy form of hemoglobin.

    Most modern achievements in biotechnology, such as the new malaria vaccines and Casgevy, work on individuals. In the future, though, engineered biology will increasingly be used to solve problems at a grander scale.

    An Illinois-based company called LanzaTech, for example, is already using engineered Clostridium microbes to transform steel factory emissions into ethanol. In 2019, their microbes made 9 million gallons of ethanol from steel waste gas emitted from a single Chinese factory. The company also has a pilot-scale, carbon-negative process to make acetone and isopropanol from factory emissions.

    In the state of Georgia, a company called Living Carbon has planted hundreds of engineered poplar trees that “can capture 27% more carbon dioxide due to a faster growth rate and accumulation of 53% more biomass.” Although these trees are still being tested in early field trials, it’s clear that our ability to engineer multicellular organisms is increasing.

    Soon, living cells will solve planetary problems.

    Why Now

    In the last five years, the F.D.A. granted full approval to an mRNA product for the first time and the W.H.O. recommended the life-saving RTS,S malaria vaccine for children. In the last few months, a large clinical trial by the drug company Gilead demonstrated that a twice-yearly antiviral drug has an efficacy of 96–100% in preventing HIV and a genome editing technology was used to insert more than 11,000 bases of DNA into precise locations in plant genomes for the first time. These achievements will improve the lives of many people. They also suggest, at least anecdotally, that now is a good time to work in biotechnology.

    For one, this is really the first generation where direct molecular observation and manipulation of living cells is possible. Commercial DNA sequencers and synthesizers, as well as most practical gene-editing tools (from zinc-finger nucleases to TALENs and CRISPR) were invented recently; in the last 25 years. The cost to sequence a nucleotide of DNA fell from about $20 in 1990 to fractions of a penny today. The cost to “write”—or synthesize—a base of DNA fell by four orders of magnitude between 2000 and 2017. It’s now relatively cheap to sequence and make strands of DNA that can, in turn, be used to engineer cells.

    Bioengineering tools are also being democratized at an accelerating pace. A method to make short strands of DNA, invented in 1955, was not commercialized until 1980 (a span of 25 years). Zinc-finger nucleases and TALENs, developed in 2001 and 2010, respectively, were commercialized within one year. These tools are also appearing before we understand much of how life actually works. We are tinkering with life, often, without holding a blueprint.

     *Biotechnological tools are being democratized at an accelerating pace. Protein purification, invented in 1937, did not become easy to do (low skill) and cheap (low finance) for several decades. CRISPR gene-editing, by contrast, was being taught in university lab courses just three years after its invention. Adapted from Jackson S.S. et al.Nature Biotechnology (2019).*

    There are also many low-hanging fruits in biology, broadly. Unlike physics and electrical engineering, where core theoretical principles were solidified in the 20th century and it often costs billions of dollars to make seminal advances, important research in biology can still be carried out for a few thousand dollars.

    Much of basic biology is still unknown, even in areas that scientists have explored for decades. E. coli is the most widely studied organism of all time, but one-third of its genes do not have an experimentally-determined function.

    Estimates suggest there are between 1 and 6 billion species of life on Earth (but we don’t know for sure), yet only 0.01 percent of them have ever been studied. (CRISPR was initially discovered in a halophilic archaea, called Haloferax mediterranei, that thrives in salty environments.)

    If you work on biology, in other words, there’s a good chance you’ll find something useful.

    Challenges

    Working with atoms is more difficult than working with bits.

    In computer science, a $200 laptop can be used to work on a nearly infinite number of software problems. There is no such device in biology. Even simple experiments require DNA, plasmids, cells, a clean workspace, enzymes, and specialized machines. Biotechnology sits somewhere between physics and computer science in terms of “difficulty” and access.

    Education is also a major bottleneck. There are not nearly enough resources to learn about molecular bioengineering.

    MIT, Stanford and North Carolina State University have excellent undergraduate programs, as do many schools in China, Amsterdam, Denmark, and elsewhere. But hands-on training in gene-editing and the minutiae of cellular engineering can often only be obtained by first completing an undergraduate degree in biochemistry or other “classic” field, and then entering graduate school to specialize more deeply.

    Fortunately, alternative education initiatives are emerging. Courses such as “How to Grow (Almost) Anything” at MIT allow students anywhere in the world to program lab robots and engineer cells in Cambridge. Teaching assistants send data back to the remote students. Community colleges, such as Laney in San Francisco, offer excellent hands-on biotechnology training programs. Other small colleges are poised to follow. I’ll have more to say about education in a future essay.

    There are many paths into biotechnology, and entering through unconventional channels may actually be advantageous.

    On a recent visit to MBC BioLabs, an incubator for small biotech companies in San Francisco, I met a venture-backed founder who does not have a Ph.D. This founder trained as a physicist and mathematician and then read textbooks and talked to people until they felt ready to launch the company. Their background in mathematics and physics was critical, because the company is operating at the interface of biology and many different quantitative disciplines.

    I trained as a biologist but don’t feel equipped to make important advances in physics or mathematics. But the converse is not true. Physicists and mathematicians can—and already have—made many seminal impacts on biology.

    That’s because biology research is extremely broad in scope, and therefore open to all. Many great molecular biologists trained first in physics or math; Louis Pasteur, Max Delbrück, and Francis Crick among them. Synthetic biology, a field formed at the turn of the 21st, was also started by physicists, including Michael Elowitz at Princeton and James Collins at Boston University.

    It is these outsiders who often propose risky experiments and have the audacity—or perhaps a useful naïveté—to see them through. Elowitz and Collins not only declared that cells could be “programmed” with synthetic DNA, but actually built logic-performing gene circuits to pull it off. These physicists drew upon their experience with atoms and forces, and turned it loose upon the biological world.

    So regardless of your past, biotechnology can be your future. This is an exciting place to be, for biology is fast and slow, small and large. Chemical reactions in a cell happen in millionths of a second, even as organisms adapt and evolve over billions of years. There are organisms that measure one micron across, and “superorganisms” composed of trillions of interconnected cells spread over hundreds of acres.

    Molecules within cells are governed by the laws of physics, much like anything else. If you understand those molecules, and learn to manipulate them, you too can correct imbalances and solve important problems.

    ***

    Niko McCarty is a founder of Asimov Press and a former curriculum specialist in genetic engineering at MIT.

    Thanks to Xander Balwit, Ben James, Rob Tracinski, Alec Nielsen and Ben Gordon for reading drafts of this essay. Elements of this essay are inspired by research and writing by Tony Kulesa, Michael Elowitz, Elliot Hershberg, Drew Endy, and Rob Carlson.

    1 As one example, the Higgs boson was discovered at CERN’s Large Hadron Collider, which costs more than $5 billion to run each year.

  • This OpenAI Wet-Lab Blog is Pretty Good

    There’s a recent blog from OpenAI where they used GPT-5 to optimize a common biology experiment, called Gibson Assembly. I’ve seen criticisms online from people who say things like, “Who cares? A human totally could have done that” or whatever. And that’s true. But I still think this blog is nice for a couple reasons.

    First, faster iterations is one of the best ways to accelerate biotechnology progress more broadly. Experiments take much too long, and are often much too unreliable, for scientists to move quickly. Therefore, we should invest more resources toward optimizing and improving common methods that seem “mundane”.

    Second, this is a simple experimental system in which to test AI; indeed, that’s the whole point! Gibson Assembly has been around for nearly two decades, is widely-used, and only requires three enzymes. It is therefore a natural fit for AI companies to benchmark their models on biological questions. (The parameter space is not too large!)

    To understand what OpenAI actually did, I first need to tell you about Gibson Assembly, a common method biologists use to stitch DNA molecules together. Originally developed in 2009, most scientists use Gibson because it’s dead simple: Everything works at one temperature (50°C) and it requires only three enzymes. The DNA molecules to be joined together are designed such that they have 15-40 nucleotides, at either end, which overlaps with the other DNA molecule. All the DNA is then added to a tube and an enzyme, exonuclease, “chews back” several dozen nucleotides from the 5’ ends of each molecule, leaving behind long single-stranded “arms.” These arms float around in the liquid, collide with a matching arm in another DNA sequence, and hug each other tightly. A second enzyme, DNA polymerase, runs along these touching DNA strands and fills in parts of the arms that don’t overlap or are still single-stranded. Finally, DNA ligase seals the “nick” and heals the strands, thus forming a newly assembled, double-stranded piece of DNA.

    OpenAI collaborated with a new biosecurity startup, Red Queen Bio (co-founded by Hannu Rajaniemi, an excellent science fiction writer), to build the evaluation framework. The metric they settled on is called cloning efficiency, which just means this: For a fixed amount of input DNA (like one picogram) transformed into cells, how many colonies successfully grow and contain the correctly assembled DNA molecule? By the end of their blog post, the OpenAI team claims that they were able to boost this number 79x relative to a “baseline protocol” from New England Biolabs, or NEB, a common purveyor of the Gibson enzymes.

    An important note is that OpenAI says no humans were involved in optimizing the reaction; all the humans did was carry out protocols generated by GPT-5, and also upload experimental results back into the model. They repeated this several times, coaxing the model to iterate each time. Their Gibson Assembly was remarkably simple, involving just two DNA molecules: a gene encoding a fluorescent protein and a plasmid to hold the gene.

    (The OpenAI team, intriguingly, also set up a set up a robot to automate the Gibson Assembly and transformation, but couldn’t get it to work as well as a human. “We compared the robot’s work to human-performed experiments at each step. The robot successfully handled the transformation process…When compared directly with human-performed transformations, the robot generated similar quality data with equivalent improvements over baseline, showing early potential for automating and accelerating biological experiment optimization.” However. “while the fold-changes between the robot and human experiments were similar, absolute colony counts from the robot were approximately ten-fold lower than manual execution.”)

    After several rounds of iteration, the model made two notable proposals:

    First, it added two additional enzymes to the normal Gibson Assembly reaction. Specifically, it added “the recombinase RecA from E. coli, and phage T4 gene 32 single-stranded DNA–binding protein (gp32).” The blog continues: “Working in tandem, gp32 smooths and detangles the loose DNA ends, and RecA then guides each strand to its correct match.” This tweak improved the “cloning efficiency” metric by 14x over the standard NEB protocol.

    Second, it made a subtle change to how the assembled DNA molecules were inserted into living cells. Specifically, GPT-5 told the humans to spin down cells in a centrifuge, thus forming a pellet, prior to transforming them. This is typically not recommended because competent cells are “fragile,” but the OpenAI team writes that “the cells tolerated concentration well and the increased molecular collisions boosted transformation efficiency substantially (>30-fold on final validation).”

    Now, recall that at the start of this little blog I said I really liked this experiment! (Do not crucify me, ye AI optimists.) But no internet commentary is truly complete without some nitpicking, so here goes.

    One criticism is that the largest improvement made by the model was not related to Gibson Assembly at all! It was related to how the DNA gets delivered into cells. And, indeed, prior studies have shown something similar. (This research paper, for example, says that one of the best ways to improve transformation is to concentrate cells beforehand. Fair play to the OpenAI team for linking to this in their blog post.) And if you are a human reading this blog, and you are planning to spin down your competent cells before transformation, just be sure to aliquot everything into small tubes first; repeated spins will, over time, kill everything.

    Another issue is that adding RecA and gp32 to a Gibson Assembly reaction complicates things quite a bit. For a normal Gibson reaction, everything comes in a single kit from NEB with the enzymes, and the whole experiment is done at one temperature: 50°C! But doing a Gibson Assembly this way would require one to buy purified RecA and gp32, and also change incubation temperatures to get everything working (RecA and gp32 work best at 37°C.) This is more expensive and more complicated, but maybe worthwhile in some cases.

    And lastly, the selected metric — namely, how many colonies one gets from a given amount of DNA — doesn’t actually seem all that useful in most scenarios. A scientist stitching together two strands of DNA doesn’t actually care if they only get five colonies because, often, they only need to get ONE colony that works, and then they can grow up those cells in large beakers and extract a huge amount of the plasmid. A more useful metric might be to increase the total number of unique DNA strands that can be joined together in a single Gibson Assembly reaction, without reducing overall quality, instead.

    Still, I liked this blog post as a whole. I’m glad people are optimizing the “small” things, and I don’t blame OpenAI for not trying to solve cancer, in its overwhelming magnitude of manifestations, on their first attempt! Gibson Assembly is a much better starting point.

  • Enzymes from Random Molecules

    new paper in Nature shows that enzymes can be made by mixing just four molecules together, none of which are amino acids. The four molecules randomly link together to form long polymer chains, some of which catalyze chemical reactions.

    Though this sounds impressive, the paper itself is quite strange. For one, it is extremely short (only about 2,700 words) and has no discussion section. The text is also absurdly dense; likely designed to be read by materials or physics people, rather than biologists. And lastly, I think the paper is most interesting for the things it leaves unwritten — the ideas left out rather than put in. Understanding why this paper matters, then, is mostly an exercise in speculation.

    For context, scientists have been trying to design new enzymes for decades. But this “design” has traditionally been done by searching for amino acid sequences which then fold into a 3D shape with some desired function. Computational biologists tend to fixate on the sequence; they tend to consider proteins as individuals rather than as populations of molecules.

    Enzyme design is also a really hard problem. An enzyme’s interior holds amino acids in a precise way, such that the amino acid(s) in the active site latch onto substrates and convert them into new molecules. This “active site” is surrounded by other amino acids that create a microenvironment suited to the reaction. If the substrate is negatively-charged, for example, the microenvironment works to exclude positively-charged molecules.

    Despite their complexity, biologists have designed viable enzymes computationally. Last year, David Baker’s group at the University of Washington designed a serine hydrolase that breaks down ester groups, or chemicals made by joining together an acid and an alcohol. This AI-designed enzyme has an active site made from three amino acids (a “catalytic triad”) that work together to catalyze the reactions. But it was quite slow, completing just one reaction per second, compared to the thousands of reactions per second that is typical of natural serine hydrolases. Enzyme design thus remains a mostly unsolved problem.

    This new Nature paper, though, took a completely different approach. The key breakthrough, in my eyes, is its focus on populations of polymers rather than in trying to create one perfect polymer. The authors created enzymes using a statistical or probabilistic approach, rather than a deterministic one.

    The researchers focused on metalloenzymes, which are arguably simpler than serine hydrolases because they only have a single amino acid in their active site, rather than a ‘triad’. Metalloenzymes hold metal ions (often zinc, iron, or copper) in that active site; hence the name. The researchers made two types of metalloenzymes: terpene cyclases, which take a string of carbons as substrate and “loop” them into a circle, and peroxidases, which use the iron in heme to oxidize substrates, like hydrogen peroxide. I’ll just focus on the terpene cyclase, as the approach taken was largely identical in both cases.

    In nature, terpene cyclases take a straight chain of ten carbon atoms — a molecule called citronellal — and fold them into a ring. If all goes well, the enzyme makes isopulegol, which is a carbon ring with one alcohol group. But if water gets into the active site, this reaction is disrupted and the enzyme instead makes menthoglycol, which is the same carbon loop but with two alcohol groups.

    Natural terpene cyclases have aspartate in their active site. The aspartate donates a proton to citronellal, thus making one of its carbon atoms positively charged. This triggers cyclization into a ring, as the “activated” carbon joins the carbon at the other side of the chain. The aspartate is surrounded by a hydrophobic shell, which keeps water out so that isopulegol gets made selectively instead of menthoglycol.

    Seeking to create random polymers which could mimic a terpene cyclase, the researchers first analyzed 1,300 metalloproteins, looking for commonalities between them. They found two things: First, metalloproteins tend to have one “key” amino acid in their interior — often histidine or aspartate — which latches onto the metal ion, locking it in place, so that it can perform the chemical reaction. Second, metalloenzymes tend to surround their active sites with hydrophobic amino acids, which exclude water molecules. To make a metalloenzyme, then, one basically just needs to situate a single amino acid, or electron donor, inside a hydrophobic shell.

    Next, the authors scoured chemical databases for molecules with these same properties, meaning they are hydrophobic or similar in shape and charge to histidine or aspartate. They ultimately settled on four molecules:

    • Methyl methacrylate (MMA), a hydrophobic molecule.
    • 2-ethylhexyl methacrylate (EHMA), an even more hydrophobic molecule.
    • Oligo(ethylene glycol) methyl ether methacrylate (OEGMA), a hydrophilic molecule.
    • 3-sulfopropyl methacrylate potassium salt (SPMA), which mimics aspartate as an electron donor; the active site surrogate.

    (Note: You need some hydrophilic molecules, even when trying to build a hydrophobic active site, because the polymers won’t dissolve in water without them. Instead, they will aggregate or precipitate out of the solution. Hence the inclusion of OEGMA.)

    Then, the researchers mixed these four molecules together, and each molecule randomly linked with others to create long and unique polymers. The hope was that some of these “pseudo-random” polymers would position a SPMA amid hydrophobic molecules, thus creating a terpene cyclase mimic. Initially, things did not go to plan.

    In their first trial, the researchers mixed 50% MMA, 20% EHMA, 25% OEGMA, and 5% SPMA and added the resulting polymers to citronellal. After 24 hours, the polymers cyclized citronellal, but poorly. About half of the citronellal molecules were converted, and only 55 percent of products were isopulegol. In other words, the polymers could slowly catalyze reactions, but not selectively.

    So the authors iterated. To optimize their reaction, they used a Monte Carlo algorithm to generate 100,000 polymer sequences based on each molecule’s ratio and reactivity. By tinkering with the molecular ratios and re-running these simulations, they figured out they could improve the odds that SPMA would be surrounded by hydrophobic residues — and thus act like a terpene cyclase — if they increased SPMA’s concentration (to 15%) while decreasing OEGMA (to 5%).

    This yielded much better results. In a second round, the polymers converted 91 percent of citronellal after 24 hours, with a selectivity for isopulegol of 76 percent.

    So why does any of this matter? Well, the paper doesn’t really say, outside of some vague or indirect commentary. So what follows is mostly speculation…

    I think one reason this paper is important is because it does away with the outdated notion that enzymes must be tuned at the sequence-level. The study shows, rather, that enzymes can be made spontaneously using pseudo-random populations of molecules, much like the earliest cells on Earth probably did. Early lifeforms didn’t need to evolve the perfect enzyme; they just needed to find concoctions of molecules that were “good enough” for a particular function.

    The study also suggests that the 20 amino acids used by cells are not particularly special, and their functions can be replaced with other molecules carrying the same properties — like “charged” or “hydrophobic” or “flexible” and so on.

    When I first discussed this paper with a friend, a protein biochemist, they urged me not to write about it. They said that metalloenzymes are not particularly difficult to make, and so this paper’s outcomes aren’t all that surprising. They pointed to another studydemonstrating that it’s possible to make functional metalloenzymes simply by mixing purified phenylalanine with zinc ions.

    My retort to their criticism, though, is that the authors have already used this same “random polymer” approach to make other types of proteins. In 2020, for example, they made protein channels that were exquisitely sensitive to protons and, during our conversation, hinted that they have also made other classes of enzymes, including hydrolases.

    But still, this paper leaves so much left unsaid. I suspect many protein biochemists reading this blog still won’t find the work impressive or useful or surprising or whatever. It takes a long time to overturn dogma, after all, and it’ll be an uphill battle to change peoples’ perceptions of enzymes and how one can make them.

    The paper itself also took seven years of work, according to a corresponding author, and involved many back-and-forth debates with a “hostile” reviewer. The manuscript was cut nearly in half (from nearly 5,000 words to 2,700), losing much of its philosophical framing. “This was the hardest paper I’ve ever published,” the authors told me. And after spending a week wrestling with whether to write about it, I understand why.