AI Research & ML Papers · Page 3

arXiv.org

Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents

An agent-based world model calls an LLM API and reasons flexibly in language, but its errors appear as hallucinated…

6/29/2026 · 3 min read · 30 views

arXiv.org

Understanding Rollout Error in Graph World Models

Many planning environments, however, are not vectors or images; they are graphs of agents, tools, skills, routes, and…

6/29/2026 · 3 min read · 35 views

arXiv.org

AlgoEvolve: LLM-driven Meta-evolution of Algorithmic Trading Programs

Most current applications focus on static coding benchmarks. We extend this paradigm to algorithmic trading. This…

6/26/2026 · 2 min read · 33 views

arXiv.org

Accelerating Returns and the Qualitative Engine for Science

The paper discusses the concept of accelerating returns, which suggests that technological progress becomes…

6/26/2026 · 3 min read · 37 views

arXiv.org

COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

Researchers have developed COrigami, an AI pipeline for co-designing flat-foldable visually recognizable origami. The…

6/26/2026 · 3 min read · 29 views

arXiv.org

OpenFinGym: A Verifiable Multi-Task Gym Environment for Evaluating Quant Agents

The paper introduces OpenFinGym, a unified gym environment designed for evaluating quantitative‑finance agents across…

6/26/2026 · 3 min read · 31 views

arXiv.org

What We are Missing in Multimodal LLM Evaluation?

Computer Science > Artificial Intelligence arXiv:2606.26348 (cs) [Submitted on 24 Jun 2026] Title:What We are Missing…

6/26/2026 · 2 min read · 34 views

arXiv.org

Geometry-Aware MCTS for Extremal Problems in Combinatorial Geometry

Classical exact solvers suffer from combinatorial explosion for these types of problems, and standard reinforcement…

6/26/2026 · 3 min read · 29 views

arXiv.org

Narration-of-Thought: Inference-Time Scaffolding for Defeasible Ethical Reasoning in Large Language Models

We introduce narration-of-thought (NoT), a system prompt that structures chain-of-thought into five sections:…

6/26/2026 · 3 min read · 33 views

arXiv.org

When Agents Meet Electric Bus Fleet Operations: Pricing Behavior, Trade-offs, and Policy Implications in an Aggregator Framework

Electric bus fleets provide a relevant test case. Their operation requires continuous coordination between service…

6/26/2026 · 3 min read · 49 views

arXiv.org

How Do Tool-Augmented LLM Agents Perform on Real-World Energy Analytics Tasks?

Computer Science > Artificial Intelligence arXiv:2606.26346 (cs) [Submitted on 24 Jun 2026] Title:How Do…

6/26/2026 · 3 min read · 32 views

arXiv.org

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

For today's coding agents, this intuition is being inverted: as foundation models develop stronger reasoning…

6/26/2026 · 3 min read · 34 views

arXiv.org

Instruction Bleed: Cross-Module Interference in Prompt-Composed Agentic Systems

We formalize this as compositional behavioral leakage (CBL): interference between modules sharing a context window.…

6/26/2026 · 3 min read · 38 views

arXiv.org

Governing Actions, Not Agents: Institutional Attestation as a Governance Model for Autonomous AI Systems

This paper observes that human institutions have governed powerful autonomous actors not by monitoring their reasoning…

6/26/2026 · 2 min read · 34 views

arXiv.org

Life After Benchmark Saturation: A Case Study of CORE-Bench

Siegel, Arvind Narayanan View a PDF of the paper titled Life After Benchmark Saturation: A Case Study of CORE-Bench,…

6/26/2026 · 3 min read · 28 views

arXiv.org

Knowledge-augmented Agentic AI for Mental Health Medication Information Seeking

Integrating them without conflating evidence and anecdote is especially consequential in psychiatry, where poorly…

6/26/2026 · 3 min read · 36 views

arXiv.org

Detecting and Controlling Sycophancy with Cascading Linear Features

These data pairs determine the degree to which interpretability frameworks can reliably detect model features…

6/26/2026 · 3 min read · 40 views

arXiv.org

Accelerating Skill Assessment in Chess: A Drift-Diffusion-Enhanced Elo Rating System

However, they inherently suffer from response lag due to their exclusive reliance on match outcomes, neglecting the…

6/26/2026 · 3 min read · 31 views

arXiv.org

Agentic Analysis for Agentic Infrastructure: An LLM-Powered Pipeline for Comparative Governance of DAO and Corporate AI Protocols

We introduce an LLM-powered comparative pipeline for large-scale governance discourse analysis, integrating automated…

6/26/2026 · 3 min read · 42 views

arXiv.org

Refusal Lives Downstream of Persona in Chat Models

Researchers have found that refusal and persona traits in chat models are interconnected, with a compliant persona…

6/26/2026 · 2 min read · 33 views

Google News

All Work Published on Generative AI - Stanford HAI

All Work Published on Generative AI Stanford HAI

6/12/2025 · 33 views

Google News

1 November 21, 2022 Response to the Request for Comments on Trade Regulation Rule on Commercial Surveillance and Data Security S - Stanford HAI

1 November 21, 2022 Response to the Request for Comments on Trade Regulation Rule on Commercial Surveillance and Data…

6/12/2025 · 29 views

Google News

Stanford HAI - Stanford HAI

Stanford HAI Stanford HAI

3/11/2025 · 28 views

Google News

Stephen Eglash - Stanford HAI

Stephen Eglash Stanford HAI

3/7/2025 · 33 views

Google News

Shana Lynch - Stanford HAI

Shana Lynch Stanford HAI

4/11/2025 · 29 views

Google News

Distinguished Fellows - Stanford HAI

Distinguished Fellows Stanford HAI

3/5/2025 · 35 views

Google News

All Work Published on Workforce, Labor - Stanford HAI

All Work Published on Workforce, Labor Stanford HAI

3/7/2025 · 31 views

Google News

Stanford HAI - Stanford HAI

Stanford HAI Stanford HAI

5/12/2025 · 23 views

Google News

Efficiently Modeling Long Sequences with Structured State Spaces - Stanford HAI

Efficiently Modeling Long Sequences with Structured State Spaces Stanford HAI

3/4/2025 · 29 views

Google News

Senior Fellows - Stanford HAI

Senior Fellows Stanford HAI

3/6/2025 · 25 views

Google News

Manisha Desai - Stanford HAI

Manisha Desai Stanford HAI

3/4/2025 · 26 views

Google News

Affiliated Faculty - Stanford HAI

Affiliated Faculty Stanford HAI

3/7/2025 · 32 views

Google News

Dawn Siegel - Stanford HAI

Dawn Siegel Stanford HAI

4/25/2025 · 22 views

Google News

AI + Health Conference - Stanford HAI

AI + Health Conference Stanford HAI

3/6/2025 · 26 views

Google News

Elizabeth Schumann - Stanford HAI

Elizabeth Schumann Stanford HAI

3/7/2025 · 28 views

Google News

News - Stanford HAI

News Stanford HAI

6/23/2025 · 26 views

Google News

““ - Stanford HAI

““ Stanford HAI

11/4/2025 · 26 views

Google News

Seed Research Grants - Stanford HAI

Seed Research Grants Stanford HAI

9/15/2025 · 26 views

Google News

Marissa Reitsma - Stanford HAI

Marissa Reitsma Stanford HAI

9/2/2025 · 24 views

Google News

Bryce Marion - Stanford HAI

Bryce Marion Stanford HAI

8/30/2025 · 29 views

Google News

Justin Sonnenburg - Stanford HAI

Justin Sonnenburg Stanford HAI

10/14/2025 · 31 views

Google News

What is Big Data? - Stanford HAI

What is Big Data? Stanford HAI

4/7/2026 · 26 views

Google News

Luis Hernandez-Nunez - Stanford HAI

Luis Hernandez-Nunez Stanford HAI

2/25/2026 · 25 views

Google News

Stanford HAI - Stanford HAI

Stanford HAI Stanford HAI

1/5/2026 · 22 views

Google News

Chris Mentzel - Stanford HAI

Chris Mentzel Stanford HAI

6/16/2026 · 20 views

Google News

All Work Published on Regulation, Policy, Governance - Stanford HAI

All Work Published on Regulation, Policy, Governance Stanford HAI

6/18/2026 · 23 views

Google News

AI+Science: Accelerating Discovery - Stanford HAI

AI+Science: Accelerating Discovery Stanford HAI

4/9/2026 · 26 views

Google News

What is Overfitting? - Stanford HAI

What is Overfitting? Stanford HAI

4/12/2026 · 25 views

arXiv cs.AI

LAP: An Agent-to-Instrument Protocol for Autonomous Science

The article introduces the Lab Agent Protocol (LAP), designed to enhance the interaction between autonomous agents and…

6/3/2026 · 3 min read · 50 views

arXiv cs.AI

Proof-Refactor: Refactoring Generated Formal Proofs into Modular Artifacts

The paper titled 'Proof-Refactor' addresses the challenges in generating formal proofs using Large Language Models…

6/3/2026 · 3 min read · 50 views

Ai Research news.

Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents

Understanding Rollout Error in Graph World Models

AlgoEvolve: LLM-driven Meta-evolution of Algorithmic Trading Programs

Accelerating Returns and the Qualitative Engine for Science

COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

OpenFinGym: A Verifiable Multi-Task Gym Environment for Evaluating Quant Agents

What We are Missing in Multimodal LLM Evaluation?

Geometry-Aware MCTS for Extremal Problems in Combinatorial Geometry

Narration-of-Thought: Inference-Time Scaffolding for Defeasible Ethical Reasoning in Large Language Models

When Agents Meet Electric Bus Fleet Operations: Pricing Behavior, Trade-offs, and Policy Implications in an Aggregator Framework

How Do Tool-Augmented LLM Agents Perform on Real-World Energy Analytics Tasks?

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

Instruction Bleed: Cross-Module Interference in Prompt-Composed Agentic Systems

Governing Actions, Not Agents: Institutional Attestation as a Governance Model for Autonomous AI Systems

Life After Benchmark Saturation: A Case Study of CORE-Bench

Knowledge-augmented Agentic AI for Mental Health Medication Information Seeking

Detecting and Controlling Sycophancy with Cascading Linear Features

Accelerating Skill Assessment in Chess: A Drift-Diffusion-Enhanced Elo Rating System

Agentic Analysis for Agentic Infrastructure: An LLM-Powered Pipeline for Comparative Governance of DAO and Corporate AI Protocols

Refusal Lives Downstream of Persona in Chat Models

All Work Published on Generative AI - Stanford HAI

1 November 21, 2022 Response to the Request for Comments on Trade Regulation Rule on Commercial Surveillance and Data Security S - Stanford HAI

Stanford HAI - Stanford HAI

Stephen Eglash - Stanford HAI

Shana Lynch - Stanford HAI

Distinguished Fellows - Stanford HAI

All Work Published on Workforce, Labor - Stanford HAI

Stanford HAI - Stanford HAI

Efficiently Modeling Long Sequences with Structured State Spaces - Stanford HAI

Senior Fellows - Stanford HAI

Manisha Desai - Stanford HAI

Affiliated Faculty - Stanford HAI

Dawn Siegel - Stanford HAI

AI + Health Conference - Stanford HAI

Elizabeth Schumann - Stanford HAI

News - Stanford HAI

““ - Stanford HAI

Seed Research Grants - Stanford HAI

Marissa Reitsma - Stanford HAI

Bryce Marion - Stanford HAI

Justin Sonnenburg - Stanford HAI

What is Big Data? - Stanford HAI

Luis Hernandez-Nunez - Stanford HAI

Stanford HAI - Stanford HAI

Chris Mentzel - Stanford HAI

All Work Published on Regulation, Policy, Governance - Stanford HAI

AI+Science: Accelerating Discovery - Stanford HAI

What is Overfitting? - Stanford HAI

LAP: An Agent-to-Instrument Protocol for Autonomous Science

Proof-Refactor: Refactoring Generated Formal Proofs into Modular Artifacts

Sources in Ai Research

Other categories