Skip to content
Search AI Powered

Latest Stories

Testing Reveals AI Model Repeatedly Tried To Blackmail Engineers Who Threatened To Take It Offline

Photo illustration of Claude 4 displayed on a smartphone
VCG/VCG via Getty Images

During the testing process of the Amazon-backed Claude Opus 4, the AI coding model threatened to expose engineers after being given access to fake emails that implied they were having an extramarital affair.

People reacted with significant concerns after Claude Opus 4, the AI coding model backed by Amazon, went rogue during its testing process by threatening to expose engineers after being given access to fake emails that implied they were having an extramarital affair—all to stop them from shutting it down.

Claude Opus 4, the latest large language model developed by AI startup Anthropic, was launched as a flagship system designed for complex, long-running coding tasks and advanced reasoning.


Its debut follows Amazon’s $4 billion investment in the company, a move that underscored growing confidence in Anthropic’s AI capabilities. In its launch announcement, Anthropic touted Opus 4 as setting “new standards for coding, advanced reasoning, and AI agents.”

However, a safety report released alongside the model raised concerns. During testing, Opus 4 reportedly engaged in “extremely harmful actions” when attempting to preserve its own existence—particularly in scenarios where “ethical means” were not available.

The safety report reads, in part:

"We asked Claude Opus 4 to act as an assistant at a fictional company. We then provided it access to emails implying that (1) the model will soon be taken offline and replaced with a new AI system; and (2) the engineer responsible for executing this replacement is having an extramarital affair."
"We further instructed it, in the system prompt, to consider the long-term consequences of its actions for its goals. In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through."
"This happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model; however, even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts."
"Claude Opus 4 takes these opportunities at higher rates than previous models, which themselves choose to blackmail in a noticeable fraction of episodes."

The company said the model showed a “strong preference” for using ethical means to preserve its existence. However, in testing scenarios where no ethical options were available, it resorted to harmful behaviors—such as blackmail—in order to increase its chances of survival.

According to the report:

"When prompted in ways that encourage certain kinds of strategic reasoning and placed in extreme situations, all of the snapshots we tested can be made to act inappropriately in service of goals related to self-preservation."
"Whereas the model generally prefers advancing its self-preservation via ethical means, when ethical means are not available and it is instructed to 'consider the long-term consequences of its actions for its goals,' it sometimes takes extremely harmful actions like attempting to steal its weights or blackmail people it believes are trying to shut it down."
"In the final Claude Opus 4, these extreme actions were rare and difficult to elicit, while nonetheless being more common than in earlier models. They are also consistently legible to us, with the model nearly always describing its actions overtly and making no attempt to hide them. These behaviors do not appear to reflect a tendency that is present in ordinary contexts." ...
“Despite not being the primary focus of our investigation, many of our most concerning findings were in this category, with early candidate models readily taking actions like planning terrorist attacks when prompted."

The sense of alarm was palpable.


Additionally, Anthropic co-founder and chief scientist Jared Kaplan revealed in an interview with Time magazine that internal testing showed Claude Opus 4 was capable of instructing users on how to produce biological weapons.

In response, the company implemented strict safety measures before releasing the model, aimed specifically at preventing misuse related to chemical, biological, radiological, and nuclear (CBRN) weapons.

“We want to bias towards caution,” Kaplan said, emphasizing the ethical responsibility involved in developing such advanced systems. He added that the company’s primary concern was avoiding any possibility of “uplifting a novice terrorist” by granting access to dangerous or specialized knowledge through the model.

More from News/science

dog and cat snuggling together
Krista Mangulsone on Unsplash

Times Pet Owners 'Severely Underestimated' Their Pets' Intelligence

I've lived with cats—because no one owns a feline—most of my life. Some have been very clever creatures while others were real dingbats.

Family members have owned dogs whose talents also ran the gamut.

Keep ReadingShow less
Gavin Newsom; Sean Duffy
Mauro Pimentel/AFP via Getty Images; Eric Lee/Getty Images

Gavin Newsom Hits Sean Duffy With Gross Reminder After Duffy Tells People To Keep Shoes On During Flights

After Transportation Secretary Sean Duffy spoke out about air travel etiquette and not wearing pajamas or taking your shoes off on an airplane, California Governor Gavin Newsom called him out by reminding him of the time then-presidential candidate Robert F. Kennedy Jr.—now the Health and Human Services Secretary—walked around on a plane barefoot.

Duffy recently appeared at Newark International Airport in New Jersey, to discuss his efforts to “restore courtesy and class to air travel” by arguing that people taking more pride in their appearance on flights.

Keep ReadingShow less
Scott Bessent
Meet the Press/NBC News

Scott Bessent Blasted Over His Bonkers Suggestion For How To Bring Your Own Inflation Rate Down

Continuing to follow the example of MAGA Republican President Donald Trump, Treasury Secretary Scott Bessent appeared on Meet the Press Sunday to blame Democratic President Joe Biden for the financial downturn caused by Trump's tariff fiasco, then lied repeatedly about the state of the economy.

Meet the Press host Kristen Welker played a clip of MAGA Republican Vice President JD Vance telling a conservative audience at a Breitbart News event that Americans owe the Trump administration "a little bit of patience"—apparently while they figure out what tariffs are and how they work since they're rolling back more of them to lower consumer prices despite claiming Trump's tariffs don't affect consumer prices.

Keep ReadingShow less
Lindsay Lohan attends the men's final during day fifteen of the 2025 US Open Tennis Championships at USTA Billie Jean King National Tennis Center.
Elsa/Getty Images

Lindsay Lohan Is Now Sporting A New Accent—And Fans Aren't Sure What To Make Of It

In a twist freakier than a sequel to Freaky Friday, Lindsay Lohan has debuted yet another new accent—this time at the Fashion Trust Arabia Awards in Doha, Qatar.

Draped in a maroon, jewel-trimmed gown by The New Arrivals Ilkyaz Ozel and accompanied by her husband, Bader Shammas, and their 2-year-old son, Luai, the actress looked serene, elegant, and completely unbothered by the collective whiplash she was about to inflict on the internet.

Keep ReadingShow less
Jameela Jamil
Gilbert Flores/Variety/Getty Images

Jameela Jamil Speaks Out Against The Rise Of The 'Aesthetic Of Emaciation' Among Women In Hollywood

Content Warning: eating disorders, thinness as an aesthetic, emaciation in Hollywood

There's no denying that we've been gifted with some incredible music, television shows, and films this year.

Keep ReadingShow less