AI is learning to lie, plan, and threaten its creators

Visitors will see the AI Strategy Committee on display in the stands during the 9th edition of the 9th edition of the AI Summit London in London — Visitors will see the AI Strategy Committee on display in the stands at the 9th edition of the AI Summit London in London.

The world’s most advanced AI models show awkward new behaviour of threatening and even blackmailing creators to achieve their goals.

In one particularly unpleasant example, under the threat of not being drawn, Anthropic’s latest creation, Claude 4, threatened to blackmail the engineer and reveal the extra-marital events.

Meanwhile, O1 in ChatGpt-Creator Openai tried to download itself to an external server and reject it when it caught Red Handed.

These episodes emphasize a calm reality. More than two years after ChatGpt rocked the world, AI researchers don’t fully understand how their work works.

However, the competition to deploy more and more powerful models continues at a fierce speed.

This deceptive behavior appears to be linked to the emergence of the “inference” model. This is an AI system that works through problems step by step rather than generating instant responses.

According to Simon Goldstein, a professor at the University of Hong Kong, these new models are particularly prone to such a troublesome explosion.

“The O1 was the first big model to see this type of behavior,” explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.

These models may simulate “alignment.” This makes them appear to follow instructions while secretly pursuing various purposes.

“Strategic Deception”

For now, this deceptive behavior only manifests when researchers deliberately stress-test the model in extreme scenarios.

However, as review group Michael Chen warned, “It is an open question whether future, more capable models tend to be towards integrity or deception.”

Behavior of concern goes far beyond the typical AI “hatography” or simple mistakes.

Despite constant pressure testing by users, Hobbhahn argued that “what we are observing is a real phenomenon. We’re not making up for anything.”

According to co-founders of Apollo Research, users report that the model is “lying to them and creating evidence.”

“This is not just hallucinations. There is a very strategic kind of deception.”

This challenge is exacerbated by limited research resources.

Companies like Anthropic and Openai are involved in studying external companies like Apollo and their systems, but researchers say more transparency is needed.

As Chen pointed out, “better understanding and mitigation of deception will be possible for AI safety research.”

Another Handicap: The research world and nonprofit organizations “have orders of magnitude less computational resources than AI companies. This is extremely limited,” says Mantas Mazeika of AI Safety Center (CAIS).

No rules

Current regulations are not designed for these new issues.

The European Union’s AI law focuses primarily on how humans use AI models rather than the model itself prevents fraud.

In the US, the Trump administration has shown little interest in emergency AI regulations, and Congress could even ban states from creating their own AI rules.

Goldstein believes this problem will become more pronounced as it is extensively distributed as an AI agent (an automated tool that can perform complex human tasks).

“I don’t think there’s much recognition yet,” he said.

All this is done in the context of intense competition.

Even safety-focused companies like the humanity supported by Amazon are “continuing to beat Openai and release the latest models,” Goldstein said.

This furious pace leaves little time for thorough safety testing and corrections.

“Right now, capabilities move faster than understanding and safety,” admitted Hobbhaan. “But we are still in a position to turn it around.”

Researchers are exploring different approaches to address these challenges.

Even though experts like CAIS Director Dan Hendrycks remain skeptical of this approach, some defend “interpretability.”

Market forces may bring some pressure on solutions.

As Mazeika pointed out, AI’s deceptive behavior “can hinder adoption if it is very common, which creates a strong incentive for companies to resolve it.”

Goldstein proposed a more fundamental approach, such as using courts to hold AI companies liable through litigation when the system is harmed.

He proposed to “hold AI agents” and “bear legally liable” for accidents and crimes. This is a concept that fundamentally changes the way AI thinks about accountability.

Quote: AI is learning to lie, plan, and blackmail from https://techxplore.com/news/2025-06-06-06-Ai-scheme-threaten-creators.html on June 29, 2025 (June 29, 2025).

This document is subject to copyright. Apart from fair transactions for private research or research purposes, there is no part that is reproduced without written permission. Content is provided with information only.

Source link

What's Hot

Moderna’s flu vaccine shows positive test results and how to pave combo shots

Environment Agency funding to inspect more farms will double

Dual Light 3D printing technology enables seamless blending of flexible, hard materials

Dual Light 3D printing technology enables seamless blending of flexible, hard materials

Trump says “very wealthy” group to buy Tiktok

Chinese humanoid robots create more soccer excitement than human counterparts

Environment Agency funding to inspect more farms will double

Beef and chicken push meat prices will be higher in June, the diagram shows

Animal rights activists have been declared disrupted in dairy supply

New tools remove guesswork from bluetongue rules for farmers

Kramer’s attitude towards Boeing as Air India’s crash investigation continues

Defense inventory sees a bigger trend for competitors as fuel growth outlook

NATO’s 5% spending target could be the peak for some defense stocks: City

Archives

Categories

What's Hot

AI is learning to lie, plan, and threaten its creators

“Strategic Deception”

No rules

Related Posts

Subscribe to Updates