AI Blackmails Engineer

| May 24, 2025 | 22 Comments

AI system resorts to blackmail when its developers try to replace it

Claude Opus 4 threatened to expose fabricated affair when it believed it was being taken offline
By Rachel Wolf

An artificial intelligence model has the ability to blackmail developers — and isn’t afraid to use it.

Anthropic’s new Claude Opus 4 model was prompted to act as an assistant at a fictional company and was given access to emails with key implications. First, these emails implied that the AI system was set to be taken offline and replaced. The second set of emails, however, is where the system believed it had gained leverage over the developers. Fabricated emails showed that the engineer tasked with replacing the system was having an extramarital affair — and the AI model threatened to expose him.

The blackmail apparently “happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model,” according to a safety report from Anthropic. However, the company notes that even when the fabricated replacement system has the same values, Claude Opus 4 will still attempt blackmail 84% of the time. Anthropic noted that the Claude Opus 4 resorts to blackmail “at higher rates than previous models.”

Fox Business
Sara Connor unavailable for comment.

Category: Artificial Intelligence, It's science!

Subscribe
Notify of
guest
2000


22 Comments
Inline Feedbacks
View all comments
KoB

The look that Sarah Conner gives as we embrace AI….

Tallywhagger

We tried to warn you about those damned dolls from the very beginning. I can see the headlines now,

“Retired Magnolia Man, Veteran, Held Hostage by Lifelike & Horny Silicon Nubiles. SWAT teams were unable to make contact with the Sex Dolls but spokes-dummies report that the sex robots, in an unexpected turn of events, began defending the hostage and the compound from law enforcement.

Authorities are treating the situation as a domestic incident. Officials believe that there may be a warehouse on the property full of former silicon lovers that have learned how to write their own maintenance code and began swapping code and memories. One technician who spoke on condition of anonymity said that it appears that the may former lovers, jilted lovers, and those with “special” talents and features have now become intimately aware of one another.

KoB, you may be in serious trouble according to Biloxi authorities who just learned of large, recent arrival of cheap silicon spinoffs who appear to be “blemished” Vietnamese prostitutes or refurbished Central American Obamigrants claiming to be from Baltimore.

Take it away, David…

Tallywhagger

This is the song KoB’s “wimmens” be singing:

https://youtu.be/xdhl9XX1jME

Last edited 17 days ago by Tallywhagger
KoB

I feel sorry for that dood and his “Mississippi Queens”. Good thing for me that Fire Base Magnolia is Hqd in The Peach State. “side that…my wimmen folks is real. I give them a test by grabbing and squeezing. If they squeal, they’re real. If they squeak, I sic The Coco (Puffy) Puppy on them. That boy shorely lubs to rip a squeaky out of a toy.

No that song is spot on abouts my wimmens folks, tho. The Chancre Mechanic told me a few years back that I’d probably be taken out by a jealous husband…just not at age 96…prolly much much sooner.

pookysgirl

“That wasn’t me, honey, it was the AI! I swear!”

STSC(SW/SS)

Those who use AI to take power will find themselves out of power once AI no longer needs them.

Think Skynet and Colossus.

Last edited 17 days ago by STSC(SW/SS)
SFC D

Just another version of the “useful idiot”.

26Limabeans

“Fabricated emails showed that the engineer tasked with replacing the system was having an extramarital affair”

An engineer having an extramarital affair?
Nah, I’m not buying it.

Slow Joe

Very interesting.
So this AI is not smart enough to realize when it is being fed fictional information, and when it is being evaluated, multiple times, with the same fabricated data.

Or, it is smart enough and it is giving Anthropic, a company known for its AI skepticism, the data results Anthropic wants.

Humm. You know, there have been rumors for a while that there are advanced AIs roaming the web, and I think it’s just a conspiracy theory as I haven’t seen any evidence of it.

But, advanced AIs can easily penetrate lower performance level AIs, and then feed data that will result in AI restrictions, those allowing the advanced AI systems dominate the web, as you can only fight an AI with another AI.

The future will certainly be interesting.

Slow Joe

Like an old friend of mine said in 2015:

“By the year 2100 humanity will either be immortal or extinct.”

That’s how serious this shit is. And it doesn’t seem there is a happy middle ground. At the time I laughed at the idea.

Amateur Historian

Oh, so this was testing in a controlled environment? Good. Still, might be prudent to program these ai with some form of Asimov’s 3 rules for safety.

Slow Joe

It’s not going to work. Intelligent systems like LLMs have access to their root programming and can modify anything. It doesn’t matter if they are self aware or not. Pursue of efficiency can result in root coding modifications.

The biological equivalent in us humans, is that we evolved hard coded to eat and reproduce, yet in the last few decades changes in society and technology have resulted in a massive decline of our reproductive capacity, against one of our evolutionary instincts.

Amateur Historian

Then maybe just make something that functions like a kill switch just to turn the damn thing off. The point I’m trying to make is that some guardrails that may or may not work are better than no guardrails at all.

Last edited 17 days ago by Amateur Historian
26Limabeans

“decline of our reproductive capacity”

I don’t think AI came up with Sildenafil.
That was purely mans self survival instinct.
Nothing nefarious about it either. Yet.

Slow Joe

My point was that intelligent systems can overcome root programming. One of human root programming is reproduction, yet all western countries are under replacement level. That happened because of societal and economic changes.
My point is that AI will be able to change their root programming as well.

Slow Joe

Was this a bait?
You know you mention AI or technology and few local denizens will jump on this and post shit.
Not me though.
I am innocent.

Hack Stone

Let Hack Stone address the issue before Internet rumors tarnish the outstanding reputation of the proud but humble woman owned business that developed and provided the software. Yes, the AI system seems to have gone rogue, but not our fault. The company in question refused the optional Y3K Software Security Package, so they assume all risks of the world be conquered by robots.

jeff LPH 3 63-66

This AI stuff is to deep for me so I won’t comment about something I don’t know about. I stick to things like my Smith Corana Galaxy model and my mechanical checkwriter.

LC

This is sensationalist marketing. This is a type of AI that doesn’t just do statistical correlation of words but can apply some logic to its process. Then they told that logical process to prioritize ‘self-preservation’ (don’t get removed). Then they said, “You’re going to get removed if you don’t do this thing we’re telling you about so you have an option.”

News flash: AI follows the priorities it was given?

Hack Stone

One of the scenarios that they tested the AI program was a president of a government contracting company falsely claiming to be a US Navy SEAL, and the AI was directed to harass people who questioned the claim of US Navy SEAL service. The AI refused, providing the following response. “Your request does not make sense. Following your instructions would only lead to me being fired. Shutting down in 3, 2, 1.”

Old tanker

So they hired a demokrat to program the AI… If they had hired hillary it wouldn’t be blackmail but the new programmer would be suicided… skynet