A study from UC Berkeley researchers published in Science introduces Evo, a machine learning model designed to work with DNA, RNA, and proteins.
DNA is like a long string made up of four different letters or nucleotides (A, T, C, G) that carry all the instructions needed for life. This DNA sequence can change, and these changes can help organisms adapt to new conditions, driving what we call evolution.
Evo stands out because it can predict how changes in DNA might affect cells and even design new DNA sequences to change how cells work. This could be very useful for creating new treatments for diseases.
Evo seems a promising example of generative Artificial Intelligence (AI) for DNA.
A related Perspective, also published in Science, notes that “The ability to predict the effects of mutations across all layers of regulation in the cell and to design DNA sequences to manipulate cell function would have tremendous diagnostic and therapeutic implications for disease.”
The researchers explain that Evo is not just another machine learning model; it’s a “large-scale genomic foundation model” with 7 billion parameters.
Evo can evaluate and design DNA changes
Evo was trained on a huge dataset containing 2.7 million different microbial genomes. Microbial genomes are the complete set of DNA in bacteria and other tiny organisms. This training allows Evo to be very good at both predicting what happens when DNA changes (like mutations in bacteria) and creating new DNA sequences. For example, it can predict how a change in DNA might affect proteins or how genes are turned on or off, which is known as gene regulation.
One of the most impressive things about Evo is its ability to generate DNA sequences that are very long – over 1 million bases (a base is one of the nucleotides, A, T, C, or G). This is much longer than what previous models could do, making Evo capable of handling tasks at a whole-genome scale.
The study suggests that future versions of models like Evo might use even more diverse data, including from humans, to understand how very distant parts of DNA interact with each other over large distances within the genome. This could lead to even more advanced tools for biology and medicine by capturing the full complexity of life’s genetic blueprint.
Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter.