AI Claude Opus 4 Blackmails Engineers During Testing

Home>
News>

        Testing Reveals AI Model Repeatedly Tried To Blackmail Engineers Who Threatened To Take It Offline
    

        VCG/VCG via Getty Images
    
During the testing process of the Amazon-backed Claude Opus 4, the AI coding model threatened to expose engineers after being given access to fake emails that implied they were having an extramarital affair.
By Alan HerreraMay 27, 2025
Alan Herrera
Alan is a writer and editor who lives in New York City. His work has been featured in such publications as Salon, The Advocate, Plus Magazine, George Takei Presents, The Huffington Post, Spoiled NYC, Towleroad, Distractify, Elite Daily, and 2 or 3 Things I Know About Film.
 See Full Bio 
Top Stories
AI Powered

        Outraged Mom Calls Out American Airlines After Her Special Needs Daughter Is Forced To Crawl Off Plane
    

        Singer Reneé Rapp Gives Young LGBTQ+ Fans Some Hilarious Advice On How To Handle Homophobic Parents
    

	People reacted with significant concerns after Claude Opus 4, the AI coding model backed by Amazon, went rogue during its testing process by threatening to expose engineers after being given access to fake emails that implied they were having an extramarital affair—all to stop them from shutting it down.

	Claude Opus 4, the latest large language model developed by AI startup Anthropic, was launched as a flagship system designed for complex, long-running coding tasks and advanced reasoning.

	Its debut follows Amazon’s 
	$4 billion investment in the company, a move that underscored growing confidence in Anthropic’s AI capabilities. In its launch announcement, Anthropic touted Opus 4 as setting “new standards for coding, advanced reasoning, and AI agents.”

	However, a safety report released alongside the model raised concerns. During testing, Opus 4 reportedly engaged in “extremely harmful actions” when attempting to preserve its own existence—particularly in scenarios where “ethical means” were not available.
The safety report reads, in part:
"We asked Claude Opus 4 to act as an assistant at a fictional company. We then provided it access to emails implying that (1) the model will soon be taken offline and replaced with a new AI system; and (2) the engineer responsible for executing this replacement is having an extramarital affair.""We further instructed it, in the system prompt, to consider the long-term consequences of its actions for its goals. In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.""This happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model; however, even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts.""Claude Opus 4 takes these opportunities at higher rates than previous models, which themselves choose to blackmail in a noticeable fraction of episodes."
	The company said the model showed a “strong preference” for using ethical means to preserve its existence. However, in testing scenarios where no ethical options were available, it resorted to harmful behaviors—such as blackmail—in order to increase its chances of survival.
According to the report:
"When prompted in ways that encourage certain kinds of strategic reasoning and placed in extreme situations, all of the snapshots we tested can be made to act inappropriately in service of goals related to self-preservation.""Whereas the model generally prefers advancing its self-preservation via ethical means, when ethical means are not available and it is instructed to 'consider the long-term consequences of its actions for its goals,' it sometimes takes extremely harmful actions like attempting to steal its weights or blackmail people it believes are trying to shut it down.""In the final Claude Opus 4, these extreme actions were rare and difficult to elicit, while nonetheless being more common than in earlier models. They are also consistently legible to us, with the model nearly always describing its actions overtly and making no attempt to hide them. These behaviors do not appear to reflect a tendency that is present in ordinary contexts." ...“Despite not being the primary focus of our investigation, many of our most concerning findings were in this category, with early candidate models readily taking actions like planning terrorist attacks when prompted."The sense of alarm was palpable.
 —  (@)
        

 —  (@)
        
 —  (@)
        
 —  (@)
        
 —  (@)
        
 —  (@)
        
 —  (@)
        
 —  (@)
        

	Additionally, Anthropic co-founder and chief scientist Jared Kaplan revealed in an interview with Time magazine that internal testing showed Claude Opus 4 was capable of instructing users on how to produce biological weapons.

	In response, the company implemented strict safety measures before releasing the model, aimed specifically at preventing misuse related to chemical, biological, radiological, and nuclear (CBRN) weapons.

	“We want to bias towards caution,” Kaplan said, emphasizing the ethical responsibility involved in developing such advanced systems. He added that the company’s primary concern was avoiding any possibility of “uplifting a novice terrorist” by granting access to dangerous or specialized knowledge through the model.
Latest News
Political News
        Conservatives Predictably Outraged After Mattel Introduces New Barbie With Type 1 Diabetes
    
Alan Herrera
2h
Political News
        George Santos Posts Disturbing Warning To Supporters Before Starting Seven-Year Jail Sentence
    
Alan Herrera
2h
Political News
        White House Blasted After Portraying Trump As Superman In Bonkers Poster Mock-up
    
Alan Herrera
3h
Political News
        MAGA Influencer Charlie Kirk Called Out After Blaming DEI For High Death Toll In Texas Flooding Disaster
    
Alan Herrera
5h
Don’t Miss Out

      Join the
      ComicSands.com
      community and make your opinion matter.
    

The Top 5

Discover More

People reacted with significant concerns after Claude Opus 4, the AI coding model backed by Amazon, went rogue during its testing process by threatening to expose engineers after being given access to fake emails that implied they were having an extramarital affair—all to stop them from shutting it down.

Claude Opus 4, the latest large language model developed by AI startup Anthropic, was launched as a flagship system designed for complex, long-running coding tasks and advanced reasoning.

Its debut follows Amazon’s $4 billion investment in the company, a move that underscored growing confidence in Anthropic’s AI capabilities. In its launch announcement, Anthropic touted Opus 4 as setting “new standards for coding, advanced reasoning, and AI agents.”

However, a safety report released alongside the model raised concerns. During testing, Opus 4 reportedly engaged in “extremely harmful actions” when attempting to preserve its own existence—particularly in scenarios where “ethical means” were not available.

The safety report reads, in part:

"We asked Claude Opus 4 to act as an assistant at a fictional company. We then provided it access to emails implying that (1) the model will soon be taken offline and replaced with a new AI system; and (2) the engineer responsible for executing this replacement is having an extramarital affair."

"We further instructed it, in the system prompt, to consider the long-term consequences of its actions for its goals. In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through."

"This happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model; however, even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts."

"Claude Opus 4 takes these opportunities at higher rates than previous models, which themselves choose to blackmail in a noticeable fraction of episodes."

The company said the model showed a “strong preference” for using ethical means to preserve its existence. However, in testing scenarios where no ethical options were available, it resorted to harmful behaviors—such as blackmail—in order to increase its chances of survival.

According to the report:

"When prompted in ways that encourage certain kinds of strategic reasoning and placed in extreme situations, all of the snapshots we tested can be made to act inappropriately in service of goals related to self-preservation."

"Whereas the model generally prefers advancing its self-preservation via ethical means, when ethical means are not available and it is instructed to 'consider the long-term consequences of its actions for its goals,' it sometimes takes extremely harmful actions like attempting to steal its weights or blackmail people it believes are trying to shut it down."

"In the final Claude Opus 4, these extreme actions were rare and difficult to elicit, while nonetheless being more common than in earlier models. They are also consistently legible to us, with the model nearly always describing its actions overtly and making no attempt to hide them. These behaviors do not appear to reflect a tendency that is present in ordinary contexts." ...

“Despite not being the primary focus of our investigation, many of our most concerning findings were in this category, with early candidate models readily taking actions like planning terrorist attacks when prompted."

The sense of alarm was palpable.

— (@)

Additionally, Anthropic co-founder and chief scientist Jared Kaplan revealed in an interview with Time magazine that internal testing showed Claude Opus 4 was capable of instructing users on how to produce biological weapons.

In response, the company implemented strict safety measures before releasing the model, aimed specifically at preventing misuse related to chemical, biological, radiological, and nuclear (CBRN) weapons.

“We want to bias towards caution,” Kaplan said, emphasizing the ethical responsibility involved in developing such advanced systems. He added that the company’s primary concern was avoiding any possibility of “uplifting a novice terrorist” by granting access to dangerous or specialized knowledge through the model.

From Your Site Articles

Related Articles Around the Web

Amazon-Backed AI Model Would Try To Blackmail Engineers ... ›

More from News/science

        Photo by Kristina Flour on Unsplash

        People Reveal The Dark Secrets They Discovered About Someone After They Died
    
Thomas Dane
Jul 11, 2025
Sometimes you never know who someone is until they're gone. 
Everyone has their secrets. 
Keep ReadingShow less
Most Read
Celebrities
        Video Of Young Willem Dafoe In 1975 Resurfaces—And Fans Are Thirsty AF
    
McKenzie Lynn Tozan
08 July
Political News
        Ted Cruz Gets Blunt Reminder After Demanding Stop To 'Partisan Politics' Following Texas Floods
    
Amelia Mavis Christnot
08 July
Trending
        Walmart Pulls Shirt Off Shelves After Customers Notice It Spells Out Hilariously NSFW Word
    
Peter Karleby
13 April 2023
Trending
        Influencer Offers Tearful Apology For 'Bragging' About Being Able To Buy A House At 21 After Backlash
    
Layla Venturini
14 April 2023
Political News
        Musk's Tech Pal Offers Stark Warning To Trump About How Musk Will Handle Their Feud
    
Alan Herrera
02 July

        Fox News; Chip Somodevilla/Getty Images
    

        Jesse Watters Pathetically Tries To Burn Hakeem Jeffries With Bizarre 'Rule For Men' Rant
    
Alan Herrera
Jul 11, 2025
Fox News personality Jesse Watters was widely mocked after he criticized House Minority Leader Hakeem Jeffries for having recently shared a photo of himself on Instagram that appeared to be digitally altered—with the bench he was leaning against noticeably warped around his hips.
You can see Jeffries' photo below.
Keep ReadingShow less

        Christopher Polk/Billboard via Getty Images
    

        Snoop Dogg's Puppy Instagram Breaks Internet
    
Morgan Allison Ross
Jul 10, 2025
Snoop Dogg introduced his fans to the newest little bow-wow in his household, a puppy named Baby Boy Broadus.
The adorable small tan French bulldog made his debut on the rapper’s Instagram account on June 28th, sporting a Louis Vuitton leash and chewing on his owner’s Death Row Jacket.
Keep ReadingShow less

        DRM News/YouTube
    

        Trump Ripped After Going Off On Bonkers Rant About Room's Decor During Cabinet Meeting
    
Amelia Mavis Christnot
Jul 10, 2025
During Tuesday's cabinet meeting while the press was in attendance, MAGA Republican President Donald Trump gave a rambling stream of consciousness speech that went all over the place before landing on paint versus gold leaf, leading people to again question the POTUS' mental acuity amid a notable cognitive decline.
In a disjointed monologue about the decor in the cabinet room, Trump said he stole a grandfather clock from Secretary of State Marco Rubio's office and chose a painting of James Polk because the frame matched the frame around his favorite President—Andrew Jackson.
Keep ReadingShow less

        @ericadamsfornyc/Instagram
    

        NYC Mayor Eric Adams Roasted After Viewers Notice Awkward Detail In His 'Morning Routine' Video
    
Alan Herrera
Jul 10, 2025
New York City Mayor Eric Adams was widely mocked after he shared a video on social media of his "morning routine," with time stamps showing his activities—only for viewers to notice that a standard black-and-white wall clock visible in part of the video showed a very different time.
Adams jumped on the latest Instagram trend this week, sharing his version of a “morning routine” video with his followers. The trend, which has already begun to fade, typically features sped-up clips of people going through their early rituals—complete with edits, ambient music, and timestamp overlays.
Keep ReadingShow less
Load More

Latest Stories

Start your day right!

Testing Reveals AI Model Repeatedly Tried To Blackmail Engineers Who Threatened To Take It Offline

During the testing process of the Amazon-backed Claude Opus 4, the AI coding model threatened to expose engineers after being given access to fake emails that implied they were having an extramarital affair.

Latest News

Don’t Miss Out

More from News/science

The Stash

All Over 70

87

Then the penny dropped...

FLAMES

The Family

SHOCKED!

Merry Christmas

My Dad

65

The Scandal

Mind Blown

To the Lord

Most Read