Search: "agent failures" — WeSearch Press

GRITH

Five AI Agent Failures in 36 Days. Zero Times the Agent Caught It

Between March 18 and April 22, 2026, public failures at Meta, Mercor, CrewAI, Vercel, and Bitwarden all pointed at the same missing layer: the system acted, and someone else noticed later.…

Tue, 28 Apr 2026 15:10:00 GMT · 12 views

ARXIV.ORG

AI Identity: Standards, Gaps, and Research Directions for AI Agents

AI agents are now running real transactions, workflows, and sub-agent chains across organizational boundaries without continuous human supervision. This creates a problem no current infrastructure is …

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

Agentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines

Multi-component natural language processing (NLP) pipelines are increasingly deployed for high-stakes decisions, yet no existing adversarial method can test their robustness under realistic conditions…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation

Skill-distillation pipelines learn reusable rules from LLM agent trajectories, but they lack a key signal: how much each step costs. Without per-step cost, a pipeline cannot distinguish adding a missi…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

Failure-Centered Runtime Evaluation for Deployed Trilingual Public-Space Agents

This paper presents PSA-Eval, a failure-centered runtime evaluation framework for deployed trilingual public-space agents. The central claim is that, when the evaluation object shifts from a static in…

Tue, 28 Apr 2026 04:13:21 GMT · 5 views

ARXIV.ORG

FastOMOP: A Foundational Architecture for Reliable Agentic Real-World Evidence Generation on OMOP CDM data

The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), maintained by the Observational Health Data Sciences and Informatics (OHDSI) collaboration, enabled the harmonisation of el…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

CLAUDEAI

Putting Lipstyk on a pig - agents write most of my code, so I wound up making a static slop analysis tool

lipstyk — static analysis for machine-generated code patterns I've been neck deep in agentic dev for a while. Started on Pi, ended up building my own toolset on top of it, and at this point the agents…

Sun, 26 Apr 2026 05:07:40 GMT · 5 views

THE INDEPENDENT

'It took nine seconds': Claude AI agent deletes company's database

PocketOS founder says ‘systemic failures’ with AI infrastructure made catastrophic failure inevitable…

Tue, 28 Apr 2026 20:01:24 GMT · 1 view

ARXIV.ORG

Results for "agent failures".

Five AI Agent Failures in 36 Days. Zero Times the Agent Caught It

AI Identity: Standards, Gaps, and Research Directions for AI Agents

Agentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines

ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation

Failure-Centered Runtime Evaluation for Deployed Trilingual Public-Space Agents

FastOMOP: A Foundational Architecture for Reliable Agentic Real-World Evidence Generation on OMOP CDM data

Putting Lipstyk on a pig - agents write most of my code, so I wound up making a static slop analysis tool

'It took nine seconds': Claude AI agent deletes company's database

PhysNote: Self-Knowledge Notes for Evolvable Physical Reasoning in Vision-Language Model

kreuzcrawl, an open source Rust crawling engine with 11 language bindings

kreuzcrawl, an open source crawling engine with Typescript bindings

kreuzcrawl, an open source Rust-core crawling engine

Or browse by topic