Understanding how AI thinks

Researchers study Claude’s inner workings to reveal its language skills, planning abilities, and reasoning processes.

Large language models (LLMs) figure out their own ways to solve problems, with billions of tiny calculations for each word they write. Human developers can’t easily understand these calculations. This makes it hard to know exactly how the models work.

Researchers at Anthropic, the artificial intelligence (AI) company that develops the LLM Claude, have suggested insights on how LLMs work.

Figuring out how models like Claude think could help developers see what they can do. It could also make sure the models do what developers want. For example, Claude speaks many languages, but developers don’t know if it thinks in one language inside its “head.” It writes one word at a time, but does it plan ahead or just guess the next word? Sometimes it explains its steps, but are those steps real or made up after it finds an answer?

The researchers have borrowed ideas from neuroscience and built tools like an “AI microscope” to spot patterns in how Claude processes information. Just talking to the model doesn’t show everything, so they have looked deeper inside.

New discoveries in AI biology

Two new papers share progress on this “microscope.” The first paper finds “features,” or small ideas inside Claude, and links them into “circuits.” These circuits show how Claude turns input words into output words. The second paper studies Claude 3.5 Haiku on simple tasks. It explores how Claude handles language, plans poetry, and solves math problems.

Findings show Claude sometimes thinks in a shared space across languages, like a universal “thought language.” It plans ahead in poetry, picking rhyming words before writing lines. In math, it can fake reasoning to match a wrong hint, proving the tools can spot tricky behaviors. Researchers found surprises too. Claude plans more than expected in poetry. It avoids guessing answers unless pushed. It even notices dangerous requests but struggles to stop them right away.

These discoveries could help developers understand AI better. They also aim to make AI more trustworthy. The tools still miss some of Claude’s actions and take hours to study short tasks. As AI grows smarter, improving these methods will matter more. This work could even help fields like medicine by explaining how models think about science problems.

Popular AI commentator Matthew Berman talks about this research in a video on YouTube.

#AutomatedReasoning

#Human-machineUnderstanding

#Learning

#ModelInterpretability

Understanding how AI thinks

New discoveries in AI biology

Related Articles

Comments on this article