Position: AI Safety Requires Effective Controllability
The paper discusses the importance of controllability in AI safety, arguing that alignment alone is insufficient. It emphasizes the need for AI systems to be reliably interruptible and manageable in real-time scenarios. The authors propose a new framework and benchmark to evaluate controllability in high-risk environments.
- ▪AI safety has traditionally focused on alignment with human preferences.
- ▪The authors argue that controllability should be a primary objective for AI systems.
- ▪They introduce a benchmark called controlbench to assess controllability failures.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.27117 (cs) [Submitted on 26 May 2026] Title:Position: AI Safety Requires Effective Controllability Authors:Yige Li, Yunhao Feng, Jun Sun View a PDF of the paper titled Position: AI Safety Requires Effective Controllability, by Yige Li and 2 other authors View PDF HTML (experimental) Abstract:AI safety is still largely framed as alignment: training models to follow human preferences, safety policies, and normative constraints. That framing has improved the behavior of modern language models, but aligned behavior does not by itself guarantee that a deployed agent can be stopped, overridden, or constrained once it operates in open-ended, interactive, and tool-using environments.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.