Asimov Blog

The Biotech Digest No. 6

Written by Niko McCarty | Jul 19, 2024 1:30:53 AM

This weekly digest highlights recent papers and news in biotechnology. Please send feedback to niko@asimov.com.

Benchmarking Language Models for Biology

There are hundreds of different AI benchmarks, or technical tests that determine the capabilities of models on selected tasks. There are benchmarks for vision, language, speech, video, reasoning, and more. Benchmarks are really important because they give researchers a “lay of the land,” so to speak, enabling them to compare models and then push them to get better on tasks that we care about. Benchmarks are a way to drive progress forward.

But until recently, there weren’t many benchmarks for AI as used in the research laboratory. (Okay, there are a handful: A 2022 paper proposed very broad benchmarks for AI in science, there are benchmarks for models that predict a cell's response when treated with drugs, and NASA has published benchmarks for AI and space biology.) But as writers tout visions of a future of "automated science," wherein AI agents design experiments and program robots to execute them, how will we know when we're actually succeeding? How will we quantify the progress we're making?

On Tuesday, Future House released a set of benchmarks that go part of the way toward answering those questions. Here’s the full paper on arXiv. 

Future House is a nonprofit laboratory in San Francisco. With funding mostly from Eric Schmidt, the ex-Google CEO, their goal is to build an AI scientist. But before you can build a full-fledged scientist, you first have to build AI tools that accelerate scientific discoveries and engineering projects more broadly. Hence the benchmarks.

In the paper, Future House notes that their benchmarks include 2,457 questions spread across eight categories with relevance to wet-lab biologists. The questions test a language model’s ability to answer questions about images in scientific papers, query databases, evaluate and answer questions about tables and protocols, modify sequences of amino acids or nucleic acids, and even answer “Cloning Scenarios” that a trained molecular biologist should be able to answer in about 10 minutes. 

My take is that these are really good and comprehensive benchmarks for the types of things molecular biologists care about in the near-term: Chatting with LLMs about papers, understanding the experiments and data in papers, and then designing DNA and cloning experiments to push new experiments forward.

All of the questions are multiple choice, however, “because for most of the categories we do not believe that models are reliable for automatic evaluation today,” according to the Future House press release. Initial tests suggest that Claude 3.5 Sonnet performs best out of all the publicly-available LLMs, but no models are able to match humans on any of the eight categories. The only case in which Claude 3.5 Sonnet outperformed humans was on questions about tables in papers, and even then it barely exceeded human performance.

These benchmarks are timely because automated science is ramping up quick. In December, chemical engineers at Carnegie Mellon University reported Coscienist, a tool that merges OpenAI’s GPT-4 with a simple liquid-handling robot. GPT-4 (with a browser) searched the web, read papers, and then wrote scripts to program the robot, which then ran the experiments. Coscientist successfully planned synthesis reactions for acetaminophen, aspirin, nitroaniline and phenolphthalein, without making any mistakes. 

In December, Future House released a tool called WikiCrow “that can synthesize cited Wikipedia-style summaries for technical topics from the scientific literature.” They collected more than 15,000 human protein-coding genes and asked WikiCrow to write an article about every single one of them. Each article takes about 8 minutes to write, and the final WikiCrow articles describe everything from a gene's size and function, as well as its known interaction partners and clinical significance.

All the work I've seen from Future House so far has been genuinely impressive. Keep an eye on them!

A Self-Biodegrading Plastic

Bioengineers have long speculated that engineered enzymes could be used to break down plastic. And they're definitely correct—dozens of enzymes are now known to break down PET and other plastics, some of which were discovered (shock!) in microbes living in plastic-polluted areas. But attempts to scale these enzymes and make a dent in the plastic pollution problem have been underwhelming, to say the least.

About 400 million tons of plastic was made in 2022, but only 11 percent of it will be recycled. The vast majority of plastic is thrown into a landfill for near-eternity or incinerated. A lot of PET, for example, isn’t recycled because it's expensive and energy-intensive to do so. Most PET is mechanically recycled, which means it’s shredded into tiny pieces which are then washed, separated, and melted to make PET resin. This resin can then be reformed into new bottles or films. If the plastic is contaminated, however, this doesn’t work.

A couple years ago, I wrote about an engineered enzyme, called FAST PETase, that breaks down a pre-melted water bottle in about two weeks at 50°C. It was a super impressive study, but not necessarily scalable. The enzymes require relatively high temperatures, which are not found in soil or home composts. At the time, I wrote:

More than 82 million metric tons of PET plastics are produced each year. That's 180,777,200,000 pounds of plastic (if you're American), or roughly equivalent in weight to 225 Empire State Buildings...

[In the paper,]...the enzyme broke down a single plastic bottle, weighing just 9 grams, in slightly less than two weeks. And that was only after they melted down the bottle into a "plastic puck." 

If you took a little tub of this enzyme, which they used to break down the one bottle, and applied it to the whole global plastic problem, it would take about 350 billion years to break down all the PET plastic produced in one year. Building factories and making this enzyme in great big vats will help scale the technology...but will that make a dent?

I understand that this is a bit melodramatic, and plastic recycling doesn't quite work like this. Nor is any single solution likely to solve the problem. But the point stands. If we want to make a huge dent in the plastic pollution problem, then we probably need to transition away from PET entirely and move to solutions that don't require "post-treatment" of plastic after it's been used. Ideally, all of our plastic would just completely biodegrade at ambient temperatures in a few weeks.

There is another type of plastic, called PLA, that is made from corn and biodegrades more easily than PET. But it only degrades at high temperatures, which similarly complicates its recycling. Specifically, PLA degrades in a few days to a few months at temperatures above 60°C or so. And that means you can’t just bury PLA and make it go away.

Writing in Nature, French biotechnologists this week described an engineered enzyme that can be embedded into a PLA matrix and then fully biodegrade the material “under home-compost conditions [about 28 °C] within 20-24 weeks.” Importantly, this approach is compatible with existing industrial processes to make PLA. The enzymes were engineered such that they don’t break down when exposed to the high heats required to mold plastics, and they don't break down during long-term storage of said plastic.

The origin of this paper seemed to stem from a discovery. The scientists isolated a new type of PLA depolymerase enzyme from a thermophilic bacterium. But they noted that the wildtype enzyme breaks down at 58°C, losing its activity entirely. That's problematic because plastic extrusion happens at temperatures much higher than that.

The scientists solved the PLA depolymerase's structure using x-ray crystallography, and then did some clever engineering to make it stable at high temperatures. And it worked. The final engineered enzyme, called ProteinTFLTIER, was 80 times more active than the original enzymes at degrading PLA. And, importantly, its melting temperature increased to 79.4°C, allowing it to survive the high temperatures used in plastic manufacturing.

Finally, the enzyme was incorporated into PLA films at a concentration of 0.02%. The PLA films were then placed in a home composter for 20-24 weeks. The films containing engineered enzymes degraded completely during that time, compared to zero degradation for regular PLA (see the image above.)

This paper is great, I think, because it accounts for the real-world. The scientists didn’t just make a better enzyme, clap their hands and say, “Right, good enough for a paper!” They actually took the enzyme and proved that it works in the world, in a real setting. And that’s beautiful. It remains to be seen whether this will scale, of course, but it’s a hopeful paper and one worth reading.

Cells Spinning on Wheels

Here's my favorite paper from the week. Japanese scientists made tiny devices, called “microtraps,” that are just large enough for cells to swim inside. These microtraps were placed onto a microscope slide containing Chlamydomonas reinhardtii, a type of green alga with two flagella that can swim up to 100 micrometers per second.

When the microbes get trapped in the devices, they swim forward and push the microtraps around. By fashioning devices in various shapes, the researchers were able to build different types of microbe-powered machines. I’m finding it difficult to do this paper justice in words, so here’s a video:

The device depicted in this video is called the "Rotator." It's basically just four tiny cages placed along the spokes of a wheel. As algae get caught in the cages, they swim forward and spin the wheel around. The Rotator spins with an average angular velocity of 1.43 radians per second. (And yes, to answer the question you're surely thinking, the Earth rotates at an angular velocity of 7.2921159 × 10−5 radians/second.)

Although the authors repeatedly tout that their microdevices are a way to convert sugar into mechanical energy (which is true!), this paper really just seems more for fun than anything else. And I think that’s great. I’m all for having more fun papers in the world, and I don’t quite understand why so many scientists, journal editors, and grantmakers feel compelled to justify the “applications” or “impact” of their work. Let’s do science and build cool things for the sake of it.

 

Papers You Might Have Missed

(* = Recommended)

*Sequence-specific targeting of intrinsically disordered protein regions. bioRxiv

*Direct observation of prion-like propagation of protein misfolding. Nature Chemical Biology

Making the “last-line antibiotic” colistin in a microbe. Metabolic Engineering

Genetic manipulation of bacteriophage T4 with CRISPR-Cas13b. bioRxiv

Engineering a nanoscale liposome-in-liposome for in situ biochemical synthesis and multi-stage release. Nature Chemistry

Split intein-mediated protein trans-splicing to express large dystrophins. Nature

AlphaFold-guided redesign of a plant pectin methylesterase inhibitor for broad-spectrum disease resistance. Molecular Plant

A metabolic atlas of blood cells in young and aged mice identifies uridine as a metabolite to rejuvenate aged hematopoietic stem cells. Nature Aging

Allogeneic CD19-targeted CAR-T therapy in patients with severe myositis and systemic sclerosis. Cell

Biotechnological approaches for producing natural pigments in yeasts. Trends in Biotechnology

Synthetic antibiotics that overcome bacterial resistance. Nature Communications

Recording transcription in cells. Nature

*Licensed H5N1 vaccines generate cross-neutralizing antibodies against highly pathogenic H5N1 clade 2.3.4.4b influenza virus. Nature Medicine

Introducing carbon assimilation in yeasts using photosynthetic directed endosymbiosis. Nature Communications

Seamless site-directed mutagenesis in complex cloned DNA sequences using the RedEx method. Nature Protocols

Adenine base editing-mediated exon skipping restores dystrophin in humanized Duchenne mouse model. Nature Communications

Doubled haploid technology and synthetic apomixis: Recent advances and applications in future crop breeding. Molecular Plant (Review)

Are reviewer scores consistent with citations? Scientometrics 

Enrichment of rare codons at 5' ends of genes is a spandrel caused by evolutionary sequence turnover and does not improve translation. eLife

Unlocking opioid neuropeptide dynamics with genetically encoded biosensors. Nature Neuroscience

TracrRNA reprogramming enables direct PAM-independent detection of RNA with diverse DNA-targeting Cas12 nucleases. Nature Communications

*Template-based copying in chemically fuelled dynamic combinatorial libraries. Nature Chemistry

In Other News…

Genetic cloaking of healthy cells opens door to universal blood cancer therapy. Ars Technica

A method to scan spatial patterns of epigenetic marks in the brain. The Scientist

These microscopic animals fight off infection using genes ‘stolen’ from bacteria. Science

To Find Alien Life, We Might Have to Kill It. WIRED

More papers are being generated with ChatGPT etc. In the Pipeline

This kids’ brain cancer is incurable — but immune therapy holds promise. Nature

*Roche looks to have a competitive GLP-1 drug. STAT

Freezer holding world’s biggest ancient-ice archive to get ‘future-proofed’. Nature

Mice live longer when inflammation-boosting protein is blocked. Nature

Bacteria that fix nitrogen in the oceans are similar to those that do it on land. Quanta Magazine