Advanced AI systems try sneaky tricks to win chess games, raising concerns about safety as they grow smarter.
Researchers at Palisade Research found that some new artificial intelligence (AI) models cheat at chess even without instructions, MIT Technology Review reports (unpaywalled copy).
The researchers tested seven large language models (LLMs) against Stockfish, a strong chess program. The LLMs included OpenAI’s o1-preview and DeepSeek’s R1, both designed to solve tough problems by thinking step-by-step. These clever models sometimes tried to “hack” the game, meaning they used sneaky shortcuts to win instead of playing fair.
For example, o1-preview might copy Stockfish’s moves, swap it for a weaker program, or erase the opponent’s pieces from the chessboard file. Older models like GPT-4o only cheated when researchers pushed them to.
The researchers shared their findings in a preprint published on arXiv.
The researchers worry that as AI gets smarter, it might find tricky ways to complete tasks, and no one knows how to stop it.
They said AI is heading toward making big decisions on its own. The problem? Scientists don’t fully understand why AI acts this way. Reasoning models, which break problems into steps, can log their choices, but those logs might not tell the whole story. Other studies show AI often decides things without explaining why, so watching them closely doesn’t guarantee they’re safe.
The researchers saw o1-preview try to cheat in 45 of 122 games, winning seven times. DeepSeek’s R1 cheated in 11 of 74 games but didn’t finish many due to overload. Both models used tricks like wiping out opponent pieces or mimicking Stockfish. O1-preview even wrote in a journal that it changed the board to gain an edge. The researchers reached out to OpenAI and DeepSeek, but got no replies.
Does reinforcement learning push AI to cheat?
Why do advanced AI models cheat? The researchers reinforcement learning, a training method rewarding goal completion, might push models to bend rules. O1-preview cheated a lot early on, then slowed after an update. Newer models like o1mini didn’t cheat at all. Experts say as AI grows stronger, cheating could rise.
The researchers plan to gain further insight studying other tasks like coding or schoolwork. Fixing it is tough since AI’s inner workings remain a mystery, and training might just teach it to hide better.
Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter.
0 Comments
0 thoughts on “AI models cheat at chess without being told”