Search: "system reliability"

6 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

6 results for "system reliability"

Judging the Judges: A Systematic Evaluation of Bias Mitigation Strategies in LLM-as-a-Judge Pipelines

LLM-as-a-Judge has become the dominant paradigm for evaluating language model outputs, yet LLM judges exhibit systematic biases that compromise evaluation reliability. We present a comprehensive empir…

Tue, 28 Apr 2026 04:13:21 GMT · 4 views

ARXIV.ORG

A systematic evaluation of vision-language models for observational astronomical reasoning tasks

Vision-language models (VLMs) are increasingly proposed as general-purpose tools for scientific data interpretation, yet their reliability on real astronomical observations across diverse modalities r…

Tue, 28 Apr 2026 04:13:21 GMT · 5 views

ARXIV.ORG

When AI reviews science: Can we trust the referee?

The volume of scientific submissions continues to climb, outpacing the capacity of qualified human referees and stretching editorial timelines. At the same time, modern large language models (LLMs) of…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

LLM-Augmented Traffic Signal Control with LSTM-Based Traffic State Prediction and Safety-Constrained Decision Support

Traffic signal control is a critical task in intelligent transportation systems, yet conventional fixed-time and rule-based methods often struggle to adapt to dynamic traffic demand and provide limite…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress

As large language models (LLMs) are increasingly deployed in high-stakes and operational settings, evaluation strategies based solely on aggregate accuracy are often insucient to characterize system r…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

FastOMOP: A Foundational Architecture for Reliable Agentic Real-World Evidence Generation on OMOP CDM data

The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), maintained by the Observational Health Data Sciences and Informatics (OHDSI) collaboration, enabled the harmonisation of el…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

Or browse by topic

World US Politics Technology AI Markets Business Science Climate Health Culture Media

Results for "system reliability".

Judging the Judges: A Systematic Evaluation of Bias Mitigation Strategies in LLM-as-a-Judge Pipelines

A systematic evaluation of vision-language models for observational astronomical reasoning tasks

When AI reviews science: Can we trust the referee?

LLM-Augmented Traffic Signal Control with LSTM-Based Traffic State Prediction and Safety-Constrained Decision Support

An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress

FastOMOP: A Foundational Architecture for Reliable Agentic Real-World Evidence Generation on OMOP CDM data

Or browse by topic