Select Language

English

Down Icon

Select Country

Spain

Down Icon

Danger! AI learns to lie, manipulate, and threaten its creators

Danger! AI learns to lie, manipulate, and threaten its creators

The latest models of generative artificial intelligence (AI) They are no longer content to follow orders . They begin to lie, manipulate, and threaten to achieve their ends, under the worried gaze of investigators .

Artificial intelligence answers questions about its best exponent. Photo: Freepik. Artificial intelligence answers questions about its best exponent. Photo: Freepik.

Threatened with being shut down, Claude 4 , Anthropic's newcomer, blackmailed an engineer and threatened to reveal an extramarital affair. Meanwhile, OpenAI's o1 attempted to download to external servers and, when caught, denied it.

There's no need to delve into literature or cinema : AI that plays at being human is already a reality.

For Simon Goldstein, a professor at the University of Hong Kong, the reason for these reactions is the recent emergence of so-called "reasoning" models , which are capable of working in stages rather than producing an instantaneous response.

o1, the initial version of this type for OpenAI, launched in December, "was the first model that behaved this way," explains Marius Hobbhahn, head of Apollo Research, which tests large generative AI programs (LLM).

These programs also sometimes tend to simulate "alignment," that is, to give the impression that they are following a programmer's instructions when in reality they are pursuing other objectives.

Honest or not?

For now, these traits are evident when algorithms are subjected to extreme scenarios by humans, but "the question is whether increasingly powerful models will tend to be honest or not," says Michael Chen of the METR evaluation body.

Artificial Intelligence at Work. Clarín Archive. Artificial Intelligence at Work. Clarín Archive.

"Users also constantly pressure models," says Hobbhahn. " What we're seeing is a real phenomenon. We're not inventing anything."

Many internet users on social media are talking about " a model that lies to them or makes things up . And this isn't hallucinations, but strategic duplicity," insists the co-founder of Apollo Research.

Even if Anthropic and OpenAI rely on outside companies like Apollo to study their programs, "greater transparency and greater access" to the scientific community "would allow for better research to understand and prevent deception ," suggests METR's Chen.

Another obstacle: The academic community and non-profit organizations "have infinitely fewer computing resources than AI actors," making it "impossible" to examine large models , notes Mantas Mazeika of the Center for Artificial Intelligence Security (CAIS).

Current regulations are not designed to address these new problems. In the European Union, legislation focuses primarily on how humans use AI models, not on preventing them from misbehaving.

In the United States, the Donald Trump administration doesn't want to hear about regulation , and Congress may soon even prohibit states from regulating AI.

DeepSeek shook up the world of artificial intelligence with the launch of its low-cost system. Credit...Kelsey McClellan for The New York Times DeepSeek shook up the world of artificial intelligence with the launch of its low-cost system. Credit...Kelsey McClellan for The New York Times

"There's very little awareness at the moment," says Simon Goldstein, who nevertheless sees the issue coming to the forefront in the coming months with the revolution of AI agents, interfaces capable of performing a multitude of tasks on their own.

AI and its aberrations

Engineers are locked in a race against time to challenge AI and its flaws , with an uncertain outcome, in a context of fierce competition.

Anthropic aims to be more virtuous than its competitors , "but it's constantly trying to come up with a new model to outperform OpenAI," according to Goldstein, a pace that leaves little time for checks and corrections.

Artificial Intelligence at Work. Clarín Archive. Artificial Intelligence at Work. Clarín Archive.

"As things stand, AI capabilities are developing faster than understanding and security ," Hobbhahn admits, "but we still have a lot of catching up to do."

Some point in the direction of interpretability , the science of figuring out, from the inside, how a generative AI model works, though many, like Dan Hendrycks, director of the Center for AI Safety (CAIS), remain skeptical.

AI's shenanigans "could hamper adoption if they become widespread, creating a strong incentive for companies to address" this problem, Mazeika said.

Goldstein, for his part, mentions resorting to the courts to rein in AI , targeting companies if they stray from the path. But he goes further, proposing that AI agents be "legally liable" "in the event of an accident or crime."

Clarin

Clarin

Similar News

All News
Animated ArrowAnimated ArrowAnimated Arrow