57 stories tagged with #agent-systems, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Agent Systems"
Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering
LLM-based Multi-Agent (LLM-MA) systems are increasingly applied to automate complex software engineering tasks such as requirements engineering, code generation, and testing. Howev…
Toward a Modular Architecture for Embedded AI Agent Systems at the Edge
The rise of Large Language Models (LLMs) has enabled agentic AI capable of complex reasoning and tool use; however, deploying such autonomy in pervasive computing environments rema…
When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning
When does multi-agent debate help data cleaning, and when does it hurt? Across three benchmarks, four model families, and over 6,000 task-condition pairs, we find debate's effect r…
StepFinder: A Temporal Semantic Framework for Failure Attribution in Multi-Agent Systems
LLM-based multi-agent systems exhibit remarkable collaborative capabilities in complex multi-step tasks. However, these systems are highly sensitive to single-step execution errors…
We wrote an open-source interactive playbook for Agentic DevOps (How to move multi-agent systems from local notebooks to production).
We wrote an open-source interactive playbook for Agentic DevOps (How to move multi-agent systems from local notebooks to production).
We wrote an open-source interactive playbook for Agentic DevOps (How to move multi-agent systems from local notebooks to production).
102. Multi-Agent Systems: When One Agent Is Not Enough
One agent is powerful but limited. Ask it to research a topic, write an article, review that...…
Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems
Long-lived AI agents are increasingly deployed as persistent operational systems, yet they are still evaluated like freshly initialized models. Day-one benchmarks miss a basic syst…
UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems
LLM-based multi-agent systems decompose complex tasks into interacting roles, but most remain manually orchestrated by prompts, tools, and control rules, while agents are rarely op…
MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation
Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limitin…
A Universal Cliff and a Design Fingerprint: Cross-Section Defect Detection Under LLM Orchestration
Production language-model systems answer a request by partitioning it across an invisible orchestration of worker agents that recompose one integrated report. We ask what this does…
Memory Curator Agent a governance layer for memory in multi-agent systems
Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game
We introduce \emph{Quantum Frog}, a two-player cooperative game built on a novel \emph{quantized-time} mechanic in which the environment advances only when a player acts. Inspired …
Operationalizing Reconstructive Authority: Runtime Construction, Dependency Resolution, and Execution Gating in Autonomous Agent Systems
Autonomous agent systems fail not only due to incorrect decisions, but due to executing decisions whose authority no longer holds at runtime. Prior work defined Reconstructive Auth…
Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof
The companion paper introduced a four-level verification lattice on agent-skill manifests (unverified, declared, tested, formal) and left the top level aspirational. This paper c…
EvoSci: A Bio-Inspired Multi-Agent Framework for the Evolution of Scientific Discovery
Large language models (LLMs), have shown strong potential in scientific discovery, yet existing methods still face substantial challenges in the design of research workflows and mu…
A Sober Look at Agentic Misalignment in Automated Workflows
We study a class of emergent misalignment in multi-agent systems (MAS), with a focus on automated workflows, which we refer to agentic misalignment. Although these systems can solv…
AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning
Recent progress on long-horizon agentic tasks has been driven largely by scaling up individual agents through stronger models, better tools, and more effective scaffolding. In cont…
PRIMA: Operational Patterns for Resilient Multi-Agent Research with Verifiable Identity and Convergent Feedback
Operating LLMs as coordinated multi-agent research systems over multi-hour runs surfaces failure modes that single-shot evaluation cannot: upstream providers throttle without warni…
Meta-Agent: From Task Descriptions to Verified Multi-Agent Systems
AI agents are increasingly used to solve complex, multi-step tasks, but existing multi-agent frameworks remain brittle as workflows grow in scale and depth. Small errors at interme…
The Next Frontier: How Multi-Agent Systems are Redefining Productivity
Multi-agent systems are no longer a research curiosity confined to academic papers and lab demos....…
When Planning Fails Despite Correct Execution: On Epistemic Calibration for LLM-Based Multi-Agent Systems
LLM-based multi-agent systems can fail even when planned actions are executed correctly because agents may misjudge their knowledge when evaluating plan feasibility, a phenomenon w…
Computable Fairness: Boltzmann-Softmax Control for AI Resource Allocation
In large-scale AI systems, allocating scarce resources such as GPU compute time and bandwidth among multiple agents is a critical challenge. Conventional policies focus on efficien…
Moss: Self-Evolution Through Source-Level Rewriting in Autonomous Agent Systems
Autonomous agentic systems are largely static after deployment: they do not learn from user interactions, and recurring failures persist until the next human-driven update ships a …
AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows
Designing multi-agent workflows is especially difficult in open-ended scientific settings where tasks lack curated training sets, reliable scalar evaluation metrics, and standardiz…
COAgents: Multi-Agent Framework to Learn and Navigate Routing Problems Search Space
Although Vehicle Routing Problems (VRP) are essential to many real-world systems, they remain computationally intractable at scale due to their combinatorial complexity. Traditiona…
Agentic Agile-V: From Vibe Coding to Verified Engineering in Software and Hardware Development
Agentic AI coding systems can inspect repositories, plan implementation steps, edit files, call tools, run tests, and submit pull requests. These capabilities make software and har…
Multi-agent Collaboration with State Management
Recent advances in multi-agent systems have shown great potential for solving complex tasks. However, when multiple agents edit a shared codebase concurrently, their changes can si…
Heartbeat-Bound Hierarchical Credentials: Cryptographic Revocation for AI Agent Swarms
Autonomous AI agents that spawn sub-agent swarms create a safety gap: existing credential revocation mechanisms, OAuth~2.0 introspection, OCSP, and W3C Status Lists, require networ…
DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows
We introduce DecisionBench, a benchmark substrate for emergent delegation in long-horizon agentic workflows. The substrate fixes a task suite (GAIA, tau-bench, BFCL multi-turn), a …
Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints
We study workflow learning in a setting where specialized agents hand off control through a shared artifact, each agent observes only a local function of that artifact and its own …
AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees
Large Multimodal Models (LMMs) have recently emerged as promising backbones for GUI-agent models, where high-resolution GUI screenshots are introduced to the prompts at each iterat…
Conflict-Resilient Multi-Agent Reasoning via Signed Graph Modeling
LLM-based multi-agent systems (MAS) have demonstrated strong reasoning and decision-making capabilities that consistently surpass those of single LLM agents. However, their perform…
EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design
Large Language Model (LLM) agents are increasingly applied to engineering design tasks, yet existing evaluation frameworks do not adequately address multi-agent systems that combin…
Metric-Gradient Projection for Stable Multi-Agent Policy Learning
General-sum multi-agent learning is often governed by a stacked update field in which each agent's policy update changes the optimization landscape faced by the others. This coupli…
Agent Bazaar: Enabling Economic Alignment in Multi-Agent Marketplaces
The deployment of Large Language Models (LLMs) as autonomous economic agents introduces systemic risks that extend beyond individual capability failures. As agents transition to di…
ANNEAL: Adapting LLM Agents via Governed Symbolic Patch Learning
LLM-based agents can recover from individual execution errors, yet they repeatedly fail on the same fault when the underlying process knowledge--operator schemas, preconditions, an…
NeuroMAS: Multi-Agent Systems as Neural Networks with Joint Reinforcement Learning
Multi-agent language systems are often built as hand-designed workflows, where agents are assigned semantic roles and communication protocols are specified in advance. We propose N…
Multi-Paradigm Agent Interaction in Practice:A Systematic Analysis of Generator-Evaluator, ReAct Loop,and Adversarial Evaluation in the buddyMe Framework
The rapid evolution of Large Language Model (LLM) agents has produced diverse interaction paradigms, yet few production systems integrate multiple paradigms within a unified archit…
MetaCogAgent: A Metacognitive Multi-Agent LLM Framework with Self-Aware Task Delegation
Multi-agent large language model (LLM) systems have shown promise for solving complex tasks through agent collaboration. However, existing frameworks assign tasks based on predefin…
Heterogeneous Information-Bottleneck Coordination Graphs for Multi-Agent Reinforcement Learning
Coordination graphs are a central abstraction in cooperative multi-agent reinforcement learning (MARL), yet existing sparse-graph learners lack a theoretically grounded mechanism t…
The Capability Paradox: How Smarter Auditors Make Multi-Agent Systems Less Secure
Multi-agent systems extend large language models (LLMs) by decomposing tasks among specialized agents, but their distributed decision process creates new attack surfaces. We identi…
LLM-Guided Communication for Cooperative Multi-Agent Reinforcement Learning
Communication is a key component in multi-agent reinforcement learning (MARL) for mitigating partial observability, yet prior approaches often rely on inefficient information excha…
SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch
Multi-agent orchestration frameworks such as LangChain, LangGraph, and CrewAI route tasks through graph-based pipelines but do not enforce the stage constraints that govern real bu…
Belief Engine: Configurable and Inspectable Stance Dynamics in Multi-Agent LLM Deliberation
LLM-based agents are increasingly used to simulate deliberative interactions such as negotiation, conflict resolution, and multi-turn opinion exchange. Yet generated transcripts of…
Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP
Deploying compound LLM agents in adversarial, partially observable sequential environments requires navigating several design dimensions: (1) what the agent sees, (2) how it reason…
FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast
Can LLM agents improve decision-making through self-generated memory without gradient updates? We propose FORGE (Failure-Optimized Reflective Graduation and Evolution), a staged, p…
Debugging Multi-Agent Systems in TypeScript: From Flat Logs to Execution Trees
AI agents are easy to demo when they follow a clean path: receive a task, call a tool, produce an...…
How We Built “Captain Cool OS” : A Multi-Agent AI Tactical Intelligence System for Cricket Captains Using Gemini
🏏 Captain Cool OS Captain Cool OS is an AI-powered tactical decision engine for cricket...…
🏏 Captain Cool — Building a Multi-Agent IPL Strategy Engine with Google Gemini, Antigravity & AI Debate
🏏 Captain Cool — AI That Thinks Like an IPL Captain What happens when you combine cricket...…
🏏 Captain Cool: Building a Multi-Agent IPL Strategist with Google Gemini & ADK
In the high-pressure cooker of T20 cricket, a captain has mere seconds to make decisions that dictate...…
🏏 Building Captain Cool: An Elite Multi-Agent IPL Match Strategist Workspace
Shoutout to the incredible folks at @gdgcloudpune, especially @antrixsh_gupta and @pratik_kale, for...…
🏏 CaptainCool AI — Building a Multi-Agent IPL War Room with Gemini 2.5
Modern T20 cricket is a tactical war room. Captains constantly calculate: bowling matchups dew...…
I’m not building “librarian AI.”
This is a submission for the Hermes Agent Challenge ARC-Neuron LLMBuilder: A Local-First...…
Multi-Agent Orchestrators: Building Reliable AI Teams That Actually Work Together
The Orchestration Imperative In late 2024, AWS Labs released the Multi-Agent Orchestrator...…
Building an Ambient Developer Daemon with Nous Hermes
A hands-on experiment in what changes when your dev assistant lives on your machine, runs...…