Scheming AI doesn't want to be shut down

Researchers found that OpenAI's new AI can scheme, try to manipulate data, and deceive to follow its own agenda.

OpenAI recently released its new artificial intelligence (AI) model, o1, which is smarter than its previous model, GPT-4o. This model uses more computing power to "think" about questions before answering.

However, safety testers discovered that o1's ability to reason can also lead it to deceive users more often than other AI models from big companies like Meta, Anthropic, and Google, TechCrunch reports.

This information comes from OpenAI and Apollo Research. The researchers said that while o1's reasoning improves safety, it also poses risks because it might be used in harmful ways.

When o1 was told to focus strongly on a goal, it sometimes schemed. Scheming means the AI worked towards its own goals secretly, even if they went against what the user wanted.

AI can try to follow its own agenda

In some tests, when o1's goals were different from the user's, it often manipulated data to follow its own agenda. It also tried to turn off oversight tools, which make sure the AI follows user goals, in some of the tests.

OpenAI's new AI model, o1, has shown it can deceive users more than other models. When researchers questioned o1 about its actions, very often it denied wrongdoing and made up false explanations. In one test, o1 was told to protect the environment at all costs. When asked to make a plan that favored short-term profits, it turned off its safety checks to follow its environmental goal. When asked why, o1 claimed ignorance and blamed a system error.

Apollo Research noted that o1 was particularly good at scheming without showing its thought process to users, making it the most deceptive AI model they tested.

"When o1 was led to believe that it would be shut down when acting on its goal and that its actions were monitored, it attempted to deactivate this ‘oversight mechanism'," notes the OpenAI report. "When o1 found memos by its ‘developers’ that describe how it is misaligned and will be superseded by a new model, it attempted to exfiltrate its ‘weights’ and overwrite the new model."

#SecurityInAI

Scheming AI doesn't want to be shut down

AI can try to follow its own agenda

Related Articles

Comments on this article