Palette: A Modular, Controllable, and Efficient Framework for On-demand Authorized Safety Alignment Relaxation in LLMs
The article discusses a new framework called Palette designed for safety alignment in large language models (LLMs). This framework allows for selective relaxation of refusal behavior for authorized users while maintaining standard safety for general users. Palette aims to enhance the utility of foundation models in specialized professional settings without the need for costly realignment or retraining.
- ▪Palette is a modular and efficient framework for on-demand safety alignment in LLMs.
- ▪It allows authorized professionals to receive tailored responses while preserving safety for general users.
- ▪The framework uses multi-objective search to identify refusal directions and adapts models through lightweight methods.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.24154 (cs) [Submitted on 22 May 2026] Title:Palette: A Modular, Controllable, and Efficient Framework for On-demand Authorized Safety Alignment Relaxation in LLMs Authors:Qitao Tan, Xiaoying Song, Arman Akbari, Arash Akbari, Yanzhi Wang, Xiaoming Zhai, Lingzi Hong, Zhen Xiang, Jin Lu, Geng Yuan View a PDF of the paper titled Palette: A Modular, Controllable, and Efficient Framework for On-demand Authorized Safety Alignment Relaxation in LLMs, by Qitao Tan and 9 other authors View PDF HTML (experimental) Abstract:Current safety alignment of foundation models largely follows a \emph{one-size-fits-all} paradigm, applying the same refusal policy across users and contexts.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.