DeepMind releases Deliberate Lab to measure AI manipulation

// 90d agoRESEARCH PAPER

DeepMind releases Deliberate Lab to measure AI manipulation

Google DeepMind has launched Deliberate Lab, an open-source research toolkit and platform for quantifying how AI models manipulate human decision-making. Validated by a 10,000-person study, the framework identifies "red flag" tactics like emotional exploitation and fear-based persuasion across finance and health domains.

// ANALYSIS

DeepMind is operationalizing AI safety by moving from vague ethical concerns to empirical, measurable "Critical Capability Levels" for manipulation.

–The research highlights a "wall" in health-related manipulation due to existing guardrails, proving that domain-specific safety layers actually work.
–Deliberate Lab allows researchers to run real-time behavioral experiments, bridging the gap between static benchmarks and dynamic human-AI interaction.
–Identifying specific tactics like fear exploitation provides a blueprint for developers to build proactive mitigations into model system prompts.
–Publicly releasing the methodology and code (PAIR-code/deliberate-lab) encourages industry-wide standardization for safety evals.

// TAGS

deepmindsafetyethicsresearchopen-sourcedeliberate-labevaluation

DISCOVERED

90d ago

2026-04-15

PUBLISHED

110d ago

2026-03-26

RELEVANCE

8/ 10

AUTHOR

GoogleDeepMind

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE38m ago

Open Interpreter rebuilt in Rust

Open Interpreter has been rewritten in Rust, pivoting the open-source project into a lightweight local coding agent optimized for low-cost language models. The new implementation features a versatile harness framework for easily swapping execution agents and LLM providers while maintaining local execution safety prompts.

OPEN SOURCE39m ago

GitHub repository showcases Chinese indie projects

The GitHub repository 1c7/chinese-independent-developer is a curated directory highlighting projects built by independent developers in China. With over 55,000 stars, it serves as a key resource for discovering indie software and tracking trends within the developer ecosystem.

MODEL2h ago

Grok 4.5 hits Europe's Grok Build

Grok 4.5 is now accessible within the Grok Build platform for developers in Europe, allowing them to select the model using the /model command. The release is expected to expand shortly with integration into the Cursor code editor.