LLMs and machine learning for genomics research
Dec. 04, 2024.
2 mins. read.
1 Interactions
AI methods such as large language models (LLMs) like GPT-4 and machine learning help advance genomics research.
Researchers at UC San Diego have shown that large language models (LLMs) like GPT-4 can speed up functional genomics.
Functional genomics studies what genes do and how they work together. A common method called gene set enrichment compares new gene groups to known databases to find their function. However, this method misses out on new biology not in these databases.
Using artificial intelligence (AI), specifically LLMs, could cut down the time researchers spend on this task.
The researchers describe the methods and results of this study in a paper published in Nature Methods.
The team tested five LLMs and found that GPT-4 was the best, with a 73% success rate in naming gene functions correctly. When given random genes, GPT-4 correctly refused to name functions 87% of the time, avoiding made-up answers or hallucinations. It also gave explanations for its choices.
The study suggests more work is needed, but LLMs could transform genomics by quickly creating new scientific ideas. The researchers made a website to help others use LLMs in their work.
Machine learning finds new patterns in the genome
In related news, researchers at the University of Toronto are using machine learning to study how human chromosomes are organized. This can affect health and disease, like cancer.
The researchers developed a method called “Signature,” which uses machine learning to find new patterns in the genome, which is all the genetic material in a human. A paper published in Nature Communications describes the development of Signature and some preliminary tests.
Signature combines imaging with chromosome conformation capture (Hi-C), a technique that gives billions of reads of genetic data, allowing to study many interactions at once. The researchers analyzed 62 data sets, each with over 3.8 million possible chromosome interactions.
“In supervised learning, you know your target. In unsupervised, you let the data speak,” notes a researcher. The team used network clustering in the unsupervised approach to find patterns in the data.
Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter.
0 Comments
0 thoughts on “LLMs and machine learning for genomics research”