Hub / Ai Research

ai-research · WeSearch

Ai Research news.

The latest AI and machine-learning research — new papers, model architectures, transformers, reinforcement learning, benchmarks, and lab announcements.

Lead story

arXiv.org

NEXUS: Structured Runtime Safety for Tool-Using LLM Agents

Researchers have introduced NEXUS, a structured runtime safety monitor for tool-using LLM agents, which applies a formal intervention policy to ensure safe execution of high-impact actions. NEXUS combines deterministic…

Ai Research news.

NEXUS: Structured Runtime Safety for Tool-Using LLM Agents

Information Discernment in Large Language Models

Benchmarking Confidential GPU Inference on NVIDIA H100 under Intel TDX

FormulaSPIN: Self-Play Fine-Tuning for Natural Language to Spreadsheet Formula Generation

OpenEvoShield: Dual Non-Stationary Continual Defense for Open-World Multi-Agent System Attacks

Hybrid LSTM-Graph Neural Framework for Robust Financial Fraud Detection and Adversarial Resilience

FineServe: A Fine-Grained Dataset and Characterization of Global LLM Serving Workloads

How AI Is Helping States Cut Through Decades of Red Tape - Stanford HAI

Spectral-LSH: Sub-Quadratic Prompt Compression via Krylov-Projected Locality-Sensitive Hashing

Rethinking Uncertainty Evaluation in Large Language Models

Geometry-Guided Constraint Learning for LLM Safety Classification

Logic-Guided Data Extraction with Answer Set Programming and Large Language Models

Statistically Grounded Sparse-Feature Interventions for Activation-Space Control in Large Language Models

AdaRoPE: Not All Attention Heads Should Rotate and Scale Equally

GraphContainer: A Unified Platform for Comparing and Debugging Graph RAG Methods

Lifted Representation Hypothesis in Language Models

Euclean: Automated Geometry Problem Formalization with Unified Verification in Lean

Mitigating Scaffolding Collapse in Socratic Tutors via Representation Alignment

Beyond Tracking or Shortcut: Composition-Bounded Predictive States in Poker Autoregressive Models

Profile-Graph Memory for LLM Agents: Implicit Cross-Entity Traversal through Narrative Profiles

LISA: Linear-Indexed Sparse Attention for Efficient Long-Context Reasoning

Stochastic Primal-Dual Decoding for Multiobjective Generative Recommender Systems

OriginBlame: Record- and Token-Level Data Provenance for AI Training Datasets

ProofCouncil: An LLM Agent for Solving Open Mathematical Problems

Communication-Efficient Digital-Twin Coordination for Heterogeneous LLM Embodied Agents over Computing Power Networks

How Does Bayesian Causal Discovery Fail? Characterising Structural Consequences in Linear Gaussian Networks under Latent Confounding

LongMedBench: Benchmarking Medical Agents for Long-Horizon Clinical Decision-Making

Fictional Worldbuilding: Multi-Agent LLM Collaboration with Hierarchical Context Compression and Iterative Review

OpenProver: Agentic and Interactive Theorem Proving with Lean 4

Toward Auditable AI Scientists: A Hypothesis Evolution Protocol for LLM Agents

Scoped Verification for Reliable Long-Horizon Agentic Context Evolution under Distribution Shift

KV-PRM: Efficient Process Reward Modeling via KV-Cache Transfer for Multi-Agent Test-Time Scaling

Neuro-Agentic Control: A Deep Learning-based LLM-Powered Agentic AI Framework for Controlling Security Controls

ARCANA: A Reflective Multi-Agent Program Synthesis Framework for ARC-AGI-2 Reasoning

L-MAD: A Systematic Evaluation of Multi-Agent Debate Structures in Legal Reasoning

MedRealMM: A Real-World Multimodal Benchmark for Chinese Online Medical Consultation

A Formalization of the Mean-Field Derivation of the Vlasov Equation: AI-Assisted Lean Formalization as a Strategy Game

Long-Horizon-Terminal-Bench: Testing the Limits of Agents on Long-Horizon Terminal Tasks with Dense Reward-Based Grading

GATS: Graph-Augmented Tree Search with Layered World Models for Efficient Agent Planning

CogniConsole: Externalizing Inference-Time Control as a Formal Abstraction for Reliable LLM Interactions

Interval Certifications for Multilayered Perceptrons via Lattice Traversal

Augmenting Fundamental Analysis with Large Language Models: A RAG-Based System for Generating Investor Briefs

Event Stream based Multi-Modal Video Anomaly Detection: A Benchmark Dataset and Algorithms

Integrating Large Language Models and Graph Convolutional Networks for Semi-Supervised Image Classification

Beyond Metadata: CAPRA for Hidden Subgroup Analysis under Missing Metadata in Medical Imaging

A Coreset Selection Framework with Ensemble Aggregation for Image Classification

PRecG: Legal Precedent Retrieval with Graph Neural Networks and Rhetorical Role Segmentation

OmniMapBench: Benchmarking Visual-Centric Reasoning on Diverse Map Documents

Inside the Skill Market: From Software Engineering Activities to Reusable Agent Skills

On Locality and Length Generalization in Visual Reasoning

Sources in Ai Research

Other categories