GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs

Apr 28, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 1 view

Autonomous multi-agent LLM systems are increasingly deployed to investigate operational incidents and produce structured diagnostic reports. Their trustworthiness hinges on whether each claim is grounded in observed evidence rather than model-internal inference. Existing groundedness evaluators (binary classifiers, LLM-as-judge scalars, self-correction loops) treat supporting evidence as interchangeable and emit a single signal that offers no principled control over downstream action. We present GSAR, a grounding-evaluation and replanning framework that (i) partitions claims into a four-way typology (grounded, ungrounded, contradicted, complementary), giving first-class standing to non-redundant alternative perspectives; (ii) assigns evidence-type-specific weights reflecting epistemic strength; (iii) computes an asymmetric contradiction-penalised weighted groundedness score; and (iv) couples that score to a three-tier decision function (proceed, regenerate, replan) driving a bounded-iteration outer loop under an explicit compute budget. We formalise the algorithm, prove six structural properties, and evaluate five design claims on FEVER with gold Wikipedia evidence under four independently-trained LLM judges (gpt-5.4, claude-sonnet-4-6, claude-opus-4-7, gemini-2.5-pro). Every ablation reproduces in the same direction on every judge: bootstrap 95% CIs on the rho=0 effect exclude 0 on all four; the no-complementary ablation under Opus 4.7 has CI [-96,-68] of 200; at n=1000 three independent judges converge to DeltaS(rho=0)=+0.058. A head-to-head against Vectara HHEM-2.1-Open is included. To our knowledge, GSAR is the first published groundedness framework coupling evidence-typed scoring with tiered recovery under an explicit compute budget.

Original article

arXiv.org

Read full at arXiv.org →

Full article excerpt tap to expand

Computer Science > Artificial Intelligence arXiv:2604.23366 (cs) [Submitted on 25 Apr 2026] Title:GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs Authors:Federico A. Kamelhar View a PDF of the paper titled GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs, by Federico A. Kamelhar View PDF HTML (experimental) Abstract:Autonomous multi-agent LLM systems are increasingly deployed to investigate operational incidents and produce structured diagnostic reports. Their trustworthiness hinges on whether each claim is grounded in observed evidence rather than model-internal inference. Existing groundedness evaluators (binary classifiers, LLM-as-judge scalars, self-correction loops) treat supporting evidence as interchangeable and emit a single signal that offers no principled control over downstream action. We present GSAR, a grounding-evaluation and replanning framework that (i) partitions claims into a four-way typology (grounded, ungrounded, contradicted, complementary), giving first-class standing to non-redundant alternative perspectives; (ii) assigns evidence-type-specific weights reflecting epistemic strength; (iii) computes an asymmetric contradiction-penalised weighted groundedness score; and (iv) couples that score to a three-tier decision function (proceed, regenerate, replan) driving a bounded-iteration outer loop under an explicit compute budget. We formalise the algorithm, prove six structural properties, and evaluate five design claims on FEVER with gold Wikipedia evidence under four independently-trained LLM judges (gpt-5.4, claude-sonnet-4-6, claude-opus-4-7, gemini-2.5-pro). Every ablation reproduces in the same direction on every judge: bootstrap 95% CIs on the rho=0 effect exclude 0 on all four; the no-complementary ablation under Opus 4.7 has CI [-96,-68] of 200; at n=1000 three independent judges converge to DeltaS(rho=0)=+0.058. A head-to-head against Vectara HHEM-2.1-Open is included. To our knowledge, GSAR is the first published groundedness framework coupling evidence-typed scoring with tiered recovery under an explicit compute budget. Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA) ACM classes: I.2.7; I.2.11 Cite as: arXiv:2604.23366 [cs.AI] (or arXiv:2604.23366v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2604.23366 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Federico Kamelhar [view email] [v1] Sat, 25 Apr 2026 16:20:28 UTC (70 KB) Full-text links: Access Paper: View a PDF of the paper titled GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs, by Federico A. KamelharView PDFHTML (experimental)TeX Source view license Current browse context: cs.AI < prev | next > new | recent | 2026-04 Change to browse by: cs cs.MA References & Citations NASA ADSGoogle Scholar Semantic Scholar export BibTeX citation Loading... BibTeX formatted citation × loading... Data provided by: Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Code, Data and Media Associated with this Article alphaXiv Toggle alphaXiv (What is alphaXiv?) Links to Code Toggle CatalyzeX Code Finder for…

This excerpt is published under fair use for community discussion. Read the full article at arXiv.org.

Anonymous · no account needed

Discussion

0 comments

GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs

Discussion

More from arXiv.org