Understanding how AI thinks

Apr. 03, 2025.
2 mins. read.

2 Interactions

About the Writer

Giulio Prisco

209.392 MPXR

Giulio Prisco is Senior Editor at Mindplex. He is a science and technology writer mainly interested in fundamental science and space, cybernetics and AI, IT, VR, bio/nano, crypto technologies.

Large language models (LLMs) figure out their own ways to solve problems, with billions of tiny calculations for each word they write. Human developers can’t easily understand these calculations. This makes it hard to know exactly how the models work.

Researchers at Anthropic, the artificial intelligence (AI) company that develops the LLM Claude, have suggested insights on how LLMs work.

Figuring out how models like Claude think could help developers see what they can do. It could also make sure the models do what developers want. For example, Claude speaks many languages, but developers don’t know if it thinks in one language inside its “head.” It writes one word at a time, but does it plan ahead or just guess the next word? Sometimes it explains its steps, but are those steps real or made up after it finds an answer?

The researchers have borrowed ideas from neuroscience and built tools like an “AI microscope” to spot patterns in how Claude processes information. Just talking to the model doesn’t show everything, so they have looked deeper inside.

New discoveries in AI biology

Two new papers share progress on this “microscope.” The first paper finds “features,” or small ideas inside Claude, and links them into “circuits.” These circuits show how Claude turns input words into output words. The second paper studies Claude 3.5 Haiku on simple tasks. It explores how Claude handles language, plans poetry, and solves math problems.

Findings show Claude sometimes thinks in a shared space across languages, like a universal “thought language.” It plans ahead in poetry, picking rhyming words before writing lines. In math, it can fake reasoning to match a wrong hint, proving the tools can spot tricky behaviors. Researchers found surprises too. Claude plans more than expected in poetry. It avoids guessing answers unless pushed. It even notices dangerous requests but struggles to stop them right away.

These discoveries could help developers understand AI better. They also aim to make AI more trustworthy. The tools still miss some of Claude’s actions and take hours to study short tasks. As AI grows smarter, improving these methods will matter more. This work could even help fields like medicine by explaining how models think about science problems.

Popular AI commentator Matthew Berman talks about this research in a video on YouTube.

#AutomatedReasoning

#Human-machineUnderstanding

#Learning

#ModelInterpretability

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter.

Exciting News! Our Mobile App is Here!

Welcome Back

No account? Create One

Join

Already have an account? Sign in

forgot password

Understanding how AI thinks

About the Writer

Giulio Prisco

RELATED NEWS

LLMs can and should learn like children

Google invested $300 million in AI firm founded by former OpenAI researchers

Latent reasoning: language models that think?

Researchers argue that LLMs don't understand the world

New discoveries in AI biology

share

Copy link

Facebook

Twitter

Telegram

Linkedin

Interactions

0 thoughts on “Understanding how AI thinks”