This OpenAI Wet-Lab Blog is Actually Pretty Good
There's a recent blog from OpenAI where they used GPT-5 to optimize a common biology experiment, called Gibson Assembly. I've seen criticisms online from people who say things like, "Who cares? A human totally could have done that" or whatever. And that's true. But I still think this blog is nice for a couple reasons.
First, faster iterations is one of the best ways to accelerate biotechnology progress more broadly. Experiments take much too long, and are often much too unreliable, for scientists to move quickly. Therefore, we should invest more resources toward optimizing and improving common methods that seem "mundane".
Second, this is a simple experimental system in which to test AI; indeed, that's the whole point! Gibson Assembly has been around for nearly two decades, is widely-used, and only requires three enzymes. It is therefore a natural fit for AI companies to benchmark their models on biological questions. (The parameter space is not too large!)
To understand what OpenAI actually did, I first need to tell you about Gibson Assembly, a common method biologists use to stitch DNA molecules together. Originally developed in 2009, most scientists use Gibson because it's dead simple: Everything works at one temperature (50°C) and it requires only three enzymes. The DNA molecules to be joined together are designed such that they have 15-40 nucleotides, at either end, which overlaps with the other DNA molecule. All the DNA is then added to a tube and an enzyme, exonuclease, "chews back" several dozen nucleotides from the 5' ends of each molecule, leaving behind long single-stranded "arms." These arms float around in the liquid, collide with a matching arm in another DNA sequence, and hug each other tightly. A second enzyme, DNA polymerase, runs along these touching DNA strands and fills in parts of the arms that don't overlap or are still single-stranded. Finally, DNA ligase seals the "nick" and heals the strands, thus forming a newly assembled, double-stranded piece of DNA.
OpenAI collaborated with a new biosecurity startup, Red Queen Bio (co-founded by Hannu Rajaniemi, an excellent science fiction writer), to build the evaluation framework. The metric they settled on is called cloning efficiency, which just means this: For a fixed amount of input DNA (like one picogram) transformed into cells, how many colonies successfully grow and contain the correctly assembled DNA molecule? By the end of their blog post, the OpenAI team claims that they were able to boost this number 79x relative to a "baseline protocol" from New England Biolabs, or NEB, a common purveyor of the Gibson enzymes.
An important note is that OpenAI says no humans were involved in optimizing the reaction; all the humans did was carry out protocols generated by GPT-5, and also upload experimental results back into the model. They repeated this several times, coaxing the model to iterate each time. Their Gibson Assembly was remarkably simple, involving just two DNA molecules: a gene encoding a fluorescent protein and a plasmid to hold the gene.
(The OpenAI team, intriguingly, also set up a set up a robot to automate the Gibson Assembly and transformation, but couldn't get it to work as well as a human. "We compared the robot's work to human-performed experiments at each step. The robot successfully handled the transformation process…When compared directly with human-performed transformations, the robot generated similar quality data with equivalent improvements over baseline, showing early potential for automating and accelerating biological experiment optimization." However. "while the fold-changes between the robot and human experiments were similar, absolute colony counts from the robot were approximately ten-fold lower than manual execution.")
After several rounds of iteration, the model made two notable proposals:
First, it added two additional enzymes to the normal Gibson Assembly reaction. Specifically, it added "the recombinase RecA from E. coli, and phage T4 gene 32 single-stranded DNA–binding protein (gp32)." The blog continues: "Working in tandem, gp32 smooths and detangles the loose DNA ends, and RecA then guides each strand to its correct match." This tweak improved the "cloning efficiency" metric by 14x over the standard NEB protocol.
Second, it made a subtle change to how the assembled DNA molecules were inserted into living cells. Specifically, GPT-5 told the humans to spin down cells in a centrifuge, thus forming a pellet, prior to transforming them. This is typically not recommended because competent cells are "fragile," but the OpenAI team writes that "the cells tolerated concentration well and the increased molecular collisions boosted transformation efficiency substantially (>30-fold on final validation)."
Now, recall that at the start of this little blog I said I really liked this experiment! (Do not crucify me, ye AI optimists.) But no internet commentary is truly complete without some nitpicking, so here goes.
One criticism is that the largest improvement made by the model was not related to Gibson Assembly at all! It was related to how the DNA gets delivered into cells. And, indeed, prior studies have shown something similar. (This research paper, for example, says that one of the best ways to improve transformation is to concentrate cells beforehand. Fair play to the OpenAI team for linking to this in their blog post.) And if you are a human reading this blog, and you are planning to spin down your competent cells before transformation, just be sure to aliquot everything into small tubes first; repeated spins will, over time, kill everything.
Another issue is that adding RecA and gp32 to a Gibson Assembly reaction complicates things quite a bit. For a normal Gibson reaction, everything comes in a single kit from NEB with the enzymes, and the whole experiment is done at one temperature: 50°C! But doing a Gibson Assembly this way would require one to buy purified RecA and gp32, and also change incubation temperatures to get everything working (RecA and gp32 work best at 37°C.) This is more expensive and more complicated, but maybe worthwhile in some cases.
And lastly, the selected metric — namely, how many colonies one gets from a given amount of DNA — doesn't actually seem all that useful in most scenarios. A scientist stitching together two strands of DNA doesn't actually care if they only get five colonies because, often, they only need to get ONE colony that works, and then they can grow up those cells in large beakers and extract a huge amount of the plasmid. A more useful metric might be to increase the total number of unique DNA strands that can be joined together in a single Gibson Assembly reaction, without reducing overall quality, instead.
Still, I liked this blog post as a whole. I'm glad people are optimizing the "small" things, and I don't blame OpenAI for not trying to solve cancer, in its overwhelming magnitude of manifestations, on their first attempt! Gibson Assembly is a much better starting point.