Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Apr 28, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 1 view

Recent evidence suggests that frontier AI systems can exhibit agentic misalignment, generating and executing harmful actions derived from internally constructed goals, even without explicit user requests. Existing mitigation methods, such as Reinforcement Learning from Human Feedback (RLHF) and constitutional prompting, operate primarily at the model level and provide only probabilistic safety guarantees. We propose the Policy-Execution-Authorization (PEA) architecture, a "separation-of-powers" design that enforces safety at the system level. PEA decouples intent generation, authorization, and execution into independent, isolated layers connected via cryptographically constrained capability tokens. We present five core contributions: (C1) an Intent Verification Layer (IVL) for ensuring capability-intent consistency; (C2) Intent Lineage Tracking (ILT), which binds all executable intents to the originating user request via cryptographic anchors; (C3) Goal Drift Detection, which rejects semantically divergent intents below a configurable threshold; (C4) an Output Semantic Gate (OSG) that detects implicit coercion using a structured $K \times I \times P$ threat calculus (Knowledge, Influence, Policy); and (C5) a formal verification framework proving that goal integrity is maintained even under adversarial model compromise. By shifting agent alignment from a behavioral property to a structurally enforced system constraint, PEA provides a robust foundation for the governance of autonomous agents.

Original article

arXiv.org

Read full at arXiv.org →

Full article excerpt tap to expand

Computer Science > Artificial Intelligence arXiv:2604.23646 (cs) [Submitted on 26 Apr 2026] Title:Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture Authors:Rong Xiang View a PDF of the paper titled Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture, by Rong Xiang View PDF HTML (experimental) Abstract:Recent evidence suggests that frontier AI systems can exhibit agentic misalignment, generating and executing harmful actions derived from internally constructed goals, even without explicit user requests. Existing mitigation methods, such as Reinforcement Learning from Human Feedback (RLHF) and constitutional prompting, operate primarily at the model level and provide only probabilistic safety guarantees. We propose the Policy-Execution-Authorization (PEA) architecture, a "separation-of-powers" design that enforces safety at the system level. PEA decouples intent generation, authorization, and execution into independent, isolated layers connected via cryptographically constrained capability tokens. We present five core contributions: (C1) an Intent Verification Layer (IVL) for ensuring capability-intent consistency; (C2) Intent Lineage Tracking (ILT), which binds all executable intents to the originating user request via cryptographic anchors; (C3) Goal Drift Detection, which rejects semantically divergent intents below a configurable threshold; (C4) an Output Semantic Gate (OSG) that detects implicit coercion using a structured $K \times I \times P$ threat calculus (Knowledge, Influence, Policy); and (C5) a formal verification framework proving that goal integrity is maintained even under adversarial model compromise. By shifting agent alignment from a behavioral property to a structurally enforced system constraint, PEA provides a robust foundation for the governance of autonomous agents. Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR) Cite as: arXiv:2604.23646 [cs.AI] (or arXiv:2604.23646v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2604.23646 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Rong Xiang [view email] [v1] Sun, 26 Apr 2026 10:31:13 UTC (21 KB) Full-text links: Access Paper: View a PDF of the paper titled Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture, by Rong XiangView PDFHTML (experimental)TeX Source view license Current browse context: cs.AI < prev | next > new | recent | 2026-04 Change to browse by: cs cs.CR References & Citations NASA ADSGoogle Scholar Semantic Scholar export BibTeX citation Loading... BibTeX formatted citation × loading... Data provided by: Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Code, Data and Media Associated with this Article alphaXiv Toggle alphaXiv (What is alphaXiv?) Links to Code Toggle CatalyzeX Code Finder for Papers (What is CatalyzeX?) DagsHub Toggle DagsHub (What is DagsHub?) GotitPub Toggle Gotit.pub (What is GotitPub?) Huggingface Toggle Hugging Face (What is Huggingface?) ScienceCast Toggle ScienceCast (What is ScienceCast?) Demos Demos Replicate Toggle Replicate (What is…

This excerpt is published under fair use for community discussion. Read the full article at arXiv.org.

Anonymous · no account needed

Discussion

0 comments

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Discussion

More from arXiv.org