The Next Frontier of AI in Production Is Chaos Engineering
Chaos engineering in AI production systems currently emphasizes safety over intent, focusing on whether experiments stay within error budgets rather than whether they yield meaningful insights. The article argues that while tools exist to ensure safe failure injection, there is a lack of systems that determine which experiments would be most informative. A shift toward intent-based chaos engineering is needed to align experiments with learning goals and improve understanding of system failure behavior.
- ▪Current chaos engineering tools prioritize safety by ensuring experiments stay within SLO error budgets but do not assess whether the experiment teaches something valuable.
- ▪The core issue is that static scripts used in chaos engineering become outdated as systems evolve, leading to tests that reflect obsolete system states.
- ▪Intent-based chaos engineering aims to design experiments that update the team's understanding of failure propagation, rather than just verifying recovery.
- ▪The article references a patented architecture (US12242370B2) for intent-based chaos engineering and draws on insights from engineers at companies like Intuit and GPTZero.
- ▪Safety determines how much to break; intent determines what breaking it will teach—these require different tooling, but only the former is well-developed.
Opening excerpt (first ~120 words) tap to expand
Artificial Intelligence The Next Frontier of AI in Production Is Chaos Engineering Blast-radius control tells you how much to break. Intent tells you what breaking it will teach. Only one of these has mature tooling. Sayali Patil Apr 28, 2026 18 min read Share Image by Growtika, via Unsplash Here is a question that no chaos engineering tool in production today can answer: Did your last experiment test the right thing? Not ‘Did it stay within budget?’ That is what SLO error-budget gating handles. Not ‘Did the system survive?’ That is what abort conditions measure. The question is whether the experiment was designed to validate a specific belief about your system’s behavior, and whether its outcome changed what your team knows about failure propagation through your stack.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Towards Data Science.