back Back

LLMs and machine learning for genomics research

Dec. 04, 2024.
2 mins. read. 1 Interactions

AI methods such as large language models (LLMs) like GPT-4 and machine learning help advance genomics research.

About the Writer

Giulio Prisco

92.57225 MPXR

Giulio Prisco is Senior Editor at Mindplex. He is a science and technology writer mainly interested in fundamental science and space, cybernetics and AI, IT, VR, bio/nano, crypto technologies.

Researchers at UC San Diego have shown that large language models (LLMs) like GPT-4 can speed up functional genomics.

Functional genomics studies what genes do and how they work together. A common method called gene set enrichment compares new gene groups to known databases to find their function. However, this method misses out on new biology not in these databases.

Using artificial intelligence (AI), specifically LLMs, could cut down the time researchers spend on this task.

The researchers describe the methods and results of this study in a paper published in Nature Methods.

The team tested five LLMs and found that GPT-4 was the best, with a 73% success rate in naming gene functions correctly. When given random genes, GPT-4 correctly refused to name functions 87% of the time, avoiding made-up answers or hallucinations. It also gave explanations for its choices.

The study suggests more work is needed, but LLMs could transform genomics by quickly creating new scientific ideas. The researchers made a website to help others use LLMs in their work.

Machine learning finds new patterns in the genome

In related news, researchers at the University of Toronto are using machine learning to study how human chromosomes are organized. This can affect health and disease, like cancer.

The researchers developed a method called “Signature,” which uses machine learning to find new patterns in the genome, which is all the genetic material in a human. A paper published in Nature Communications describes the development of Signature and some preliminary tests.

Signature combines imaging with chromosome conformation capture (Hi-C), a technique that gives billions of reads of genetic data, allowing to study many interactions at once. The researchers analyzed 62 data sets, each with over 3.8 million possible chromosome interactions.

“In supervised learning, you know your target. In unsupervised, you let the data speak,” notes a researcher. The team used network clustering in the unsupervised approach to find patterns in the data.

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter

Comment on this article

0 Comments

0 thoughts on “LLMs and machine learning for genomics research

1

Like

Dislike

Share

Comments
Reactions
💯 💘 😍 🎉 👏
🟨 😴 😡 🤮 💩

Here is where you pick your favorite article of the month. An article that collected the highest number of picks is dubbed "People's Choice". Our editors have their pick, and so do you. Read some of our other articles before you decide and click this button; you can only select one article every month.

People's Choice
Bookmarks