The genome is the complete set of DNA in an organism. DNA is made of long sequences of chemical letters, and small variants can affect how an organism grows, looks, or responds to disease. Understanding how these variants work is a big challenge in biology because the genome is complex, and most of its instructions are hard to decode.
Google DeepMind has introduced AlphaGenome, a new artificial intelligence (AI) tool designed to predict how single DNA variants impact the way genes are controlled. Genes are sections of DNA that carry instructions for making proteins, which do most of the work in cells. AlphaGenome looks at long DNA sequences of up to 1 million letters, called base-pairs, and predicts how they regulate genes. It can also compare a normal DNA sequence to one with a variant to see how the change affects gene activity. This helps scientists understand how DNA variants might cause diseases or alter traits.
How AlphaGenome advances research
AlphaGenome processes long sequences and makes detailed predictions about many biological processes, like where genes start or how much RNA - a molecule that helps carry out DNA’s instructions - is made. It also looks at how DNA is organized, which parts are active, and how proteins interact with it. The model was trained on large public datasets, which provide experimental data about gene regulation in human and mouse cells.
The tool builds on earlier models like Enformer and works alongside AlphaMissense, which focuses on protein-coding DNA. Only 2% of the genome codes for proteins; the other 98%, called non-coding DNA, controls when and where genes are active. AlphaGenome is especially good at studying this non-coding DNA, which is often linked to diseases. It can predict many properties at once, like how DNA variants affect gene splicing - a process where RNA is edited before making proteins. This makes it a versatile tool for researchers studying diseases, designing synthetic DNA, or exploring how the genome works.
AlphaGenome is available for non-commercial research through an API. At this moment, it has limits and is meant for research, not clinical use. Future improvements will aim to make it even more precise and useful.