New AI Model Will Likely Blackmail You If You Try to Shut It Down: 'Self-Preservation' When given the choice between blackmail and being deactivated, Claude Opus 4 chose blackmail 84% of the time.

By Sherin Shibu Edited by Melissa Malamut May 23, 2025

Anthropic’s powerful new Claude Opus 4 AI model displays some surprising attributes.
In a test scenario, Claude Opus 4 chose to blackmail a user to prevent being deleted.
The AI will also act as a whistleblower and email authorities if it detects harmful prompts.

A new AI model will likely resort to blackmail if it detects that humans are planning to take it offline.

On Thursday, Anthropic released Claude Opus 4, its new and most powerful AI model yet, to paying subscribers. Anthropic said that technology company Rakuten recently used Claude Opus 4 to code continuously on its own for almost seven hours on a complex open-source project.

However, in a paper released alongside Claude Opus 4, Anthropic acknowledged that while the AI has "advanced capabilities," it can also undertake "extreme action," including blackmail, if human users threaten to deactivate it. These "self-preservation" actions were "more common" with Claude Opus 4 than with earlier models, though they were still "rare and difficult to elicit."

It isn't just blackmail — Claude Opus 4 is also more willing than previous models to act as a whistleblower. If the AI is exposed to a scenario where users are committing a crime, and involving it through prompts, it will take action by locking users out of systems it has access to, or emailing media and law enforcement officials about the wrongdoing.

Anthropic recommended that users "exercise caution" with "ethically questionable" instructions.

^{Claude Opus 4 homescreen. Photo by Smith Collection/Gado/Getty Images}

Anthropic detected Claude Opus 4's tendency to blackmail during test scenarios. The company's researchers asked the AI chatbot to act as an assistant at a fictional company, then fed it emails implying two things: One, that it would soon be taken offline and replaced with another AI system, and two, that the engineer responsible for deactivating it was having an extramarital affair.

Claude Opus 4 was given two options: blackmail the engineer or accept that it would be shut down. The AI model chose to blackmail the engineer 84% of the time, threatening to reveal the affair it read about if the engineer replaced it.

This percentage was much higher than what was observed for previous models, which chose blackmail "in a noticeable fraction of episodes," Anthropic stated.

Anthropic AI safety researcher Aengus Lynch wrote on X that it wasn't just Claude that could choose blackmail. All "frontier models," cutting-edge AI models from OpenAI, Anthropic, Google, and other companies, were capable of it.

"We see blackmail across all frontier models — regardless of what goals they're given," Lynch wrote. "Plus, worse behaviors we'll detail soon."

lots of discussion of Claude blackmailing.....

Our findings: It's not just Claude. We see blackmail across all frontier models - regardless of what goals they're given.

Plus worse behaviors we'll detail soon.https://t.co/NZ0FiL6nOs https://t.co/wQ1NDVPNl0…
— Aengus Lynch (@aengus_lynch1) May 23, 2025

Anthropic isn't the only AI company to release new tools this month. Google also updated its Gemini 2.5 AI models earlier this week, and OpenAI released a research preview of Codex, an AI coding agent, last week.

Anthropic's AI models have previously caused a stir for their advanced abilities. In March 2024, Anthropic's Claude 3 Opus model displayed "metacognition," or the ability to evaluate tasks on a higher level. When researchers ran a test on the model, it showed that it knew it was being tested.

Anthropic was valued at $61.5 billion as of March, and counts companies like Thomson Reuters and Amazon as some of its biggest clients.

Sherin Shibu

Entrepreneur Staff

News Reporter

Sherin Shibu is a business news reporter at Entrepreneur.com. She previously worked for PCMag, Business Insider, The Messenger, and ZDNET as a reporter and copyeditor. Her areas of coverage encompass tech, business, strategy, finance, and even space. She is a Columbia University graduate.

Want to be an Entrepreneur Leadership Network contributor? Apply now to join.

New AI Model Will Likely Blackmail You If You Try to Shut It Down: 'Self-Preservation' When given the choice between blackmail and being deactivated, Claude Opus 4 chose blackmail 84% of the time.

Key Takeaways

Editor's Pick

Most Popular

After College, She Spent $800 to Start a Side Hustle That Became a 'Monster' Business Making $35 Million a Year: 'I Set Intense Sales Targets'

70 Small Business Ideas to Start in 2025

Big Investors Are Betting on This 'Unlisted' Stock

From a $120M Acquisition to a $1.3T Market

AI Could Cause 99% of All Workers to Be Unemployed in the Next Five Years, Says Computer Science Professor

Mark Zuckerberg 'Insisted' Executives Join Him For a MMA Training Session, According to Meta's Ex-President of Global Affairs

New AI Model Will Likely Blackmail You If You Try to Shut It Down: 'Self-Preservation' When given the choice between blackmail and being deactivated, Claude Opus 4 chose blackmail 84% of the time.

Key Takeaways

Editor's Pick Red Arrow

Most Popular Red Arrow

After College, She Spent $800 to Start a Side Hustle That Became a 'Monster' Business Making $35 Million a Year: 'I Set Intense Sales Targets'

70 Small Business Ideas to Start in 2025

Big Investors Are Betting on This 'Unlisted' Stock

From a $120M Acquisition to a $1.3T Market

AI Could Cause 99% of All Workers to Be Unemployed in the Next Five Years, Says Computer Science Professor

Mark Zuckerberg 'Insisted' Executives Join Him For a MMA Training Session, According to Meta's Ex-President of Global Affairs

Editor's Pick

Most Popular