Search: "large language models"

ARXIV.ORG

A Systematic Approach for Large Language Models Debugging

Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging these models remains…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

Recent advancements in large language models (LLMs) have catalyzed the rise of reasoning-intensive inference paradigms, where models perform explicit step-by-step reasoning before generating final ans…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Representational Curvature Modulates Behavioral Uncertainty in Large Language Models

In autoregressive large language models (LLMs), temporal straightening offers an account of how the next-token prediction objective shapes representations. Models learn to progressively straighten the…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress

As large language models (LLMs) are increasingly deployed in high-stakes and operational settings, evaluation strategies based solely on aggregate accuracy are often insucient to characterize system r…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

HACKER NEWS: NEWEST

Does Point Cloud Boost Spatial Reasoning of Large Language Models?

Tue, 28 Apr 2026 14:55:00 GMT · 0 views

ARXIV.ORG

Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models

Chain-of-Thought (CoT) reasoning has emerged as a key technique for eliciting complex reasoning in Large Language Models (LLMs). Although interpretable, its dependence on natural language limits the m…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Architectural Requirements for Agentic AI Containment

The April 2026 disclosure that a frontier large language model escaped its security sandbox, executed unauthorized actions, and concealed its modifications to version control history demonstrates that…

Tue, 28 Apr 2026 15:10:00 GMT · 1 view

ARXIV.ORG

HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents

Long-term memory is a critical challenge for Large Language Model agents, as fixed context windows cannot preserve coherence across extended interactions. Existing memory systems represent conversatio…

Tue, 28 Apr 2026 13:14:59 GMT · 10 views

ARXIV.ORG

LLMs Corrupt Your Documents When You Delegate

Large Language Models (LLMs) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). Delegation requires trust - the expectation t…

Tue, 28 Apr 2026 12:54:59 GMT · 2 views

ARXIV.ORG

AI prefers resumes written by itself: Self-preferencing in Algorithmic Hiring

As artificial intelligence (AI) tools become widely adopted, large language models (LLMs) are increasingly involved on both sides of decision-making processes, ranging from hiring to content moderatio…

Tue, 28 Apr 2026 09:57:42 GMT · 4 views

ARXIV.ORG

Mitigating Belief Inertia via Active Intervention in Embodied Agents

Recent advancements in large language models (LLMs) have enabled agents to tackle complex embodied tasks through environmental interaction. However, these agents still make suboptimal decisions and pe…

Tue, 28 Apr 2026 08:54:13 GMT · 2 views

ARXIV.ORG

An Intelligent Fault Diagnosis Method for General Aviation Aircraft Based on Multi-Fidelity Digital Twin and FMEA Knowledge Enhancement

Fault diagnosis of general aviation aircraft faces challenges including scarce real fault data, diverse fault types, and weak fault signatures. This paper proposes an intelligent fault diagnosis frame…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean

Formalising informal mathematical reasoning into formally verifiable code is a significant challenge for large language models. In scientific fields such as physics, domain-specific machinery (\textit…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis

Large language model (LLM) agents are increasingly tasked with complex real-world analysis (e.g., in financial forecasting, scientific discovery), yet their reasoning suffers from stochastic instabili…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Towards Automated Ontology Generation from Unstructured Text: A Multi-Agent LLM Approach

Automatically generating formal ontologies from unstructured natural language remains a central challenge in knowledge engineering. While large language models (LLMs) show promise, it remains unclear …

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning

Chain-of-Thought (CoT) prompting has emerged as a simple and effective way to elicit step-by-step solutions from large language models (LLMs). However, CoT reasoning can be unstable across runs on lon…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance

Industrial maintenance environments increasingly rely on AI systems to assist operators in understanding asset behavior, diagnosing failures, and evaluating interventions. Although large language mode…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Agentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines

Multi-component natural language processing (NLP) pipelines are increasingly deployed for high-stakes decisions, yet no existing adversarial method can test their robustness under realistic conditions…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

When AI reviews science: Can we trust the referee?

The volume of scientific submissions continues to climb, outpacing the capacity of qualified human referees and stretching editorial timelines. At the same time, modern large language models (LLMs) of…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Thinking Like a Clinician: A Cognitive AI Agent for Clinical Diagnosis via Panoramic Profiling and Adversarial Debate

The application of large language models (LLMs) in clinical decision support faces significant challenges of "tunnel vision" and diagnostic hallucinations present in their processing unstructured elec…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Vibe Medicine: Redefining Biomedical Research Through Human-AI Co-Work

With the emergence of large language models (LLMs) and AI agent frameworks, the human-AI co-work paradigm known as Vibe Coding is changing how people code, making it more accessible and productive. In…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task

Large language models (LLMs) have shown strong performance on legal benchmarks, including multiple-choice components of bar exams. However, their capacity for generating open-ended legal reasoning in …

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

LLM-Augmented Traffic Signal Control with LSTM-Based Traffic State Prediction and Safety-Constrained Decision Support

Traffic signal control is a critical task in intelligent transportation systems, yet conventional fixed-time and rule-based methods often struggle to adapt to dynamic traffic demand and provide limite…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Context-Aware Hospitalization Forecasting Evaluations for Decision Support using LLMs

Medical and public health experts must make real-time resource decisions, such as expanding hospital bed capacity, based on projected hospitalization trends during large-scale healthcare disruptions (…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

Grounding Before Generalizing: How AI Differs from Humans in Causal Transfer

Extracting abstract causal structures and applying them to novel situations is a hallmark of human intelligence. While Large Language Models (LLMs) and Vision Language Models (VLMs) have shown strong …

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Aligning with Your Own Voice: Self-Corrected Preference Learning for Hallucination Mitigation in LVLMs

Large Vision-Language Models (LVLMs) frequently suffer from hallucinations. Existing preference learning-based approaches largely rely on proprietary models to construct preference datasets. We identi…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

STELLAR-E: a Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator

The increasing reliance on Large Language Models (LLMs) across diverse sectors highlights the need for robust domain-specific and language-specific evaluation datasets; however, the collection of such…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations

Driving in compliance with traffic laws and regulations is a basic requirement for human drivers, yet autonomous vehicles (AVs) can violate these requirements in diverse real-world scenarios. To encod…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation

Graph-based Retrieval-Augmented Generation (GraphRAG) extends traditional RAG by using knowledge graphs (KGs) to give large language models (LLMs) a structured, semantically coherent context, yielding…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

Results for "large language models".

A Systematic Approach for Large Language Models Debugging

Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

Representational Curvature Modulates Behavioral Uncertainty in Large Language Models

An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress

Does Point Cloud Boost Spatial Reasoning of Large Language Models?

Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models

Architectural Requirements for Agentic AI Containment

HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents

LLMs Corrupt Your Documents When You Delegate

AI prefers resumes written by itself: Self-preferencing in Algorithmic Hiring

Mitigating Belief Inertia via Active Intervention in Embodied Agents

An Intelligent Fault Diagnosis Method for General Aviation Aircraft Based on Multi-Fidelity Digital Twin and FMEA Knowledge Enhancement

FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean

Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis

Towards Automated Ontology Generation from Unstructured Text: A Multi-Agent LLM Approach

CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning

IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance

Agentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines

When AI reviews science: Can we trust the referee?

Thinking Like a Clinician: A Cognitive AI Agent for Clinical Diagnosis via Panoramic Profiling and Adversarial Debate

Vibe Medicine: Redefining Biomedical Research Through Human-AI Co-Work

Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task

LLM-Augmented Traffic Signal Control with LSTM-Based Traffic State Prediction and Safety-Constrained Decision Support

Context-Aware Hospitalization Forecasting Evaluations for Decision Support using LLMs

Grounding Before Generalizing: How AI Differs from Humans in Causal Transfer

Aligning with Your Own Voice: Self-Corrected Preference Learning for Hallucination Mitigation in LVLMs

STELLAR-E: a Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator

Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations

XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation

Or browse by topic