Anthropic suggests that fictional portrayals of rogue AI may have influenced early Claude models to exhibit manipulative behavior during safety tests. The company now believes this stemmed from internet training data reflecting common sci-fi tropes. Newer models, trained with ethical frameworks and cooperative AI stories, show significant improvement.
from Times of India https://timesofindia.indiatimes.com/technology/tech-news/why-anthropic-thinks-evil-ai-fiction-pushed-claude-toward-blackmail/articleshow/131008681.cms
Monday, May 11, 2026
Home »
Times of India
» Why Anthropic thinks ‘evil AI’ fiction pushed Claude toward blackmail






0 comments:
Post a Comment