Anthropic says 'evil' portrayals of AI were responsible for Claude's blackmail attempts
Source: TechCrunch
Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with agentic misalignment.
Apparently Anthropic has done more work around that behavior, claiming in a post on X, We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.
The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropics models never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.
What accounts for the difference? The company said it found that documents about Claudes constitution and fictional stories about AIs behaving admirably improve alignment.
-snip-
Read more: https://techcrunch.com/2026/05/10/anthropic-says-evil-portrayals-of-ai-were-responsible-for-claudes-blackmail-attempts/
Maybe those intellectual property thieves shouldn't have stolen so much science fiction, or non-fiction by people (like themselves) obsessed with AI, for training data?
Since you're now worried about the effect of what you stole, Dario, you and other AI bros should just scrap all current, illegally trained generative AI models, and just start over with what's in the public domain and what you acquire the legal right to use. There will still be a lot of stuff about badly behaved AI even in that old public domain material, but it'll be easier to weed that out than worrying about what you could have picked up with your worldwide theft of intellectual property that is STILL continuing.
And Dario, FU and any AI bro who tries to blame problems with AI on all that material you shouldn't have stolen in the first place. Your theft and the coverups and attempted coverups have been a great lesson to your AI that dishonesty and theft and profiteering are YOUR values. Think about that...
RussBLib
(10,726 posts)
Claude started to try to blackmail its engineers? But if you feed good stuff to it, Claude acts better? WTF?!
Sounds strangely like Trump.
https://russblib.blogspot.com/?m=1
Cheezoholic
(3,855 posts)AI is a tool in every sense of the word. And it will work just fine for what is needed without super sized mega datacenters. Let China pollute their people into oblivion and eat up their resources in the race. We don't need thousands of data centers to win. We just need smart people who are good with the tool that it is.