Understanding how AI thinks

2025-04-03
2 min read.
Researchers study Claude’s inner workings to reveal its language skills, planning abilities, and reasoning processes.
Understanding how AI thinks
Credit: Tesfu Assefa

Large language models (LLMs) figure out their own ways to solve problems, with billions of tiny calculations for each word they write. Human developers can’t easily understand these calculations. This makes it hard to know exactly how the models work.

Researchers at Anthropic, the artificial intelligence (AI) company that develops the LLM Claude, have suggested insights on how LLMs work.

Figuring out how models like Claude think could help developers see what they can do. It could also make sure the models do what developers want. For example, Claude speaks many languages, but developers don’t know if it thinks in one language inside its “head.” It writes one word at a time, but does it plan ahead or just guess the next word? Sometimes it explains its steps, but are those steps real or made up after it finds an answer?

The researchers have borrowed ideas from neuroscience and built tools like an “AI microscope” to spot patterns in how Claude processes information. Just talking to the model doesn’t show everything, so they have looked deeper inside.

New discoveries in AI biology

Two new papers share progress on this “microscope.” The first paper finds “features,” or small ideas inside Claude, and links them into “circuits.” These circuits show how Claude turns input words into output words. The second paper studies Claude 3.5 Haiku on simple tasks. It explores how Claude handles language, plans poetry, and solves math problems.

Findings show Claude sometimes thinks in a shared space across languages, like a universal “thought language.” It plans ahead in poetry, picking rhyming words before writing lines. In math, it can fake reasoning to match a wrong hint, proving the tools can spot tricky behaviors. Researchers found surprises too. Claude plans more than expected in poetry. It avoids guessing answers unless pushed. It even notices dangerous requests but struggles to stop them right away.

These discoveries could help developers understand AI better. They also aim to make AI more trustworthy. The tools still miss some of Claude’s actions and take hours to study short tasks. As AI grows smarter, improving these methods will matter more. This work could even help fields like medicine by explaining how models think about science problems.

Popular AI commentator Matthew Berman talks about this research in a video on YouTube.

#AutomatedReasoning

#Human-machineUnderstanding

#Learning

#ModelInterpretability



Related Articles


Comments on this article

Before posting or replying to a comment, please review it carefully to avoid any errors. Reason: you are not able to edit or delete your comment on Mindplex, because every interaction is tied to our reputation system. Thanks!

Mindplex

Mindplex is an AI company, a decentralized media platform, a global brain experiment, and a community dedicated to the rapidly unfolding future. Our platform empowers our community to share and discuss futurist content while showcasing AI and blockchain tools that enhance the media experience. Join us and shape the future of digital media!

ABOUT US

FAQ

CONTACT

Editors

© 2025 MindPlex. All rights reserved