12 results for "agent failures"
Five AI Agent Failures in 36 Days. Zero Times the Agent Caught It
Between March 18 and April 22, 2026, public failures at Meta, Mercor, CrewAI, Vercel, and Bitwarden all pointed at the same missing layer: the system acted, and someone else noticed later.…
AI Identity: Standards, Gaps, and Research Directions for AI Agents
AI agents are now running real transactions, workflows, and sub-agent chains across organizational boundaries without continuous human supervision. This creates a problem no current infrastructure is …
Agentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines
Multi-component natural language processing (NLP) pipelines are increasingly deployed for high-stakes decisions, yet no existing adversarial method can test their robustness under realistic conditions…
ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation
Skill-distillation pipelines learn reusable rules from LLM agent trajectories, but they lack a key signal: how much each step costs. Without per-step cost, a pipeline cannot distinguish adding a missi…
Failure-Centered Runtime Evaluation for Deployed Trilingual Public-Space Agents
This paper presents PSA-Eval, a failure-centered runtime evaluation framework for deployed trilingual public-space agents. The central claim is that, when the evaluation object shifts from a static in…
FastOMOP: A Foundational Architecture for Reliable Agentic Real-World Evidence Generation on OMOP CDM data
The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), maintained by the Observational Health Data Sciences and Informatics (OHDSI) collaboration, enabled the harmonisation of el…
Putting Lipstyk on a pig - agents write most of my code, so I wound up making a static slop analysis tool
lipstyk — static analysis for machine-generated code patterns I've been neck deep in agentic dev for a while. Started on Pi, ended up building my own toolset on top of it, and at this point the agents…
'It took nine seconds': Claude AI agent deletes company's database
PocketOS founder says ‘systemic failures’ with AI infrastructure made catastrophic failure inevitable…
PhysNote: Self-Knowledge Notes for Evolvable Physical Reasoning in Vision-Language Model
Vision-Language Models (VLMs) have demonstrated strong performance on textbook-style physics problems, yet they frequently fail when confronted with dynamic real-world scenarios that require temporal …
kreuzcrawl, an open source Rust crawling engine with 11 language bindings
kreuzcrawl is a high-performance web crawling engine. It was designed to reliably extract structured data, operating natively across multiple languages without enforcing a specific runtime. See here: …
kreuzcrawl, an open source crawling engine with Typescript bindings
kreuzcrawl is a high-performance web crawling engine. It was designed to reliably extract structured data, operating natively across multiple languages without enforcing a specific runtime. More detai…
kreuzcrawl, an open source Rust-core crawling engine
kreuzcrawl is a high-performance web crawling engine. It was designed to reliably extract structured data, operating natively across multiple languages without enforcing a specific runtime. More detai…