Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework
A new study proposes a multi-dimensional framework for evaluating reasoning quality in large language models (LLMs). This framework assesses six dimensions of reasoning, revealing insights beyond traditional accuracy metrics. The findings highlight that correct answers can stem from incoherent reasoning, emphasizing the need for a more nuanced evaluation approach.
- ▪The study introduces a framework that measures reasoning quality in LLMs across six dimensions: Correctness, Consistency, Robustness, Logical Coherence, Efficiency, and Stability.
- ▪Experiments on seven LLMs show that the framework uncovers behaviors not visible through accuracy-only metrics.
- ▪The framework demonstrates that logical coherence is independent of correctness, indicating that correct answers can arise from incoherent reasoning.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.24661 (cs) [Submitted on 23 May 2026] Title:Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework Authors:Ali Şenol, Garima Agrawal, Huan Liu View a PDF of the paper titled Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework, by Ali \c{S}enol and 1 other authors View PDF HTML (experimental) Abstract:LLMs have achieved remarkable success in complex reasoning tasks, yet current evaluation approaches predominantly rely on final-answer correctness, offering limited insight into the underlying reasoning processes that produce those answers.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.