A new AI method could overcome some important limitations of the transformer architecture and help develop LLMs with long term memory.

A new Artificial Intelligence (AI) paper by Google researchers, titled "Titans: Learning to Memorize at Test Time" and published in arXiv, is picking up attention. Some observers are describing this as "the successor to the Transformer architecture." They are expressing hopes that the new approach could overcome some important limitations of the transformer architecture introduced in the seminal 2017 paper "Attention Is All You Need," which sparked the current wave of AI advances.

There isn't yet much besides the arXiv paper itself, but more explanations are likely to come. Meanwhile, AI commentators are dissecting and analyzing the paper on social media. Matthew Berman argues on X that "this is huge for AI." Transformers, the backbone of most AI today, "struggle with long-term memory due to quadratic memory complexity," he says. "Titans aims to solve this with massive scalability."

The core concept of the "Titans" AI architecture is to integrate short-term and long-term memory capabilities within a neural network, effectively addressing the limitations of existing models like Transformers and Recurrent Neural Networks (RNNs).

Longer term memory

Researcher Ali Behrouz, the first author of the Titans paper, has posted a long X thread to explain "Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time."

Behrouz argues that Titans are more effective than Transformers and modern linear RNNs, and can effectively scale to larger context window, with better performance than current large language models (LLMs).

Behrouz tries to explain things intuitively. Attention, he says, "performs as a short-term memory, meaning that we need a neural memory module with the ability to memorize long past to act as a long-term, more persistent, memory."

Titans uses a memory mechanism that is able to retain and access information over very long sequences. Therefore, Titans is less computationally expensive than Transformers with longer inputs. Titans could improve document summarization, long-term narrative understanding, or the ability to maintain context in dialogues over extended periods.

#Deeplearning

#NeuralNetworks

About the Writer

GP

Giulio Prisco

258.57808 MPXR

Giulio Prisco is Senior Editor at Mindplex. He is a science and technology writer mainly interested in fundamental science and space, cybernetics and AI, IT, VR, bio/nano, crypto technologies.

Move over Transformers, the Titans are coming!

Longer term memory

About the Writer

Related Articles

Comments on this article

Mindplex

QUICK LINKS

ABOUT US

CONTACT