AI Research & ML Papers · Page 5

arXiv cs.AI

Inducing Reasoning Primitives from Agent Traces

The paper introduces a method called Reasoning Primitive Induction, which aims to enhance the performance of…

6/3/2026 · 2 min read · 41 views

arXiv cs.AI

WISE-HAR: A Generalizable Ensemble Deep Learning Framework for WiFi-Based Human Activity Recognition

The paper presents WISE-HAR, an ensemble deep learning framework for recognizing human activities using WiFi signals.…

6/3/2026 · 3 min read · 52 views

arXiv cs.AI

What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents

The paper discusses the limitations of current benchmarks for evaluating autonomous agents, particularly their failure…

6/3/2026 · 3 min read · 41 views

arXiv cs.AI

Large AI Models in Dental Healthcare: From General-Purpose Systems to Domain-Specific Foundation Models

A recent study explores the potential of large AI models in dental healthcare, highlighting the need for a unified…

6/3/2026 · 3 min read · 42 views

arXiv cs.AI

Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks

The paper discusses the concept of 'handoff debt' in coding tasks where agents take over interrupted work. It…

6/3/2026 · 2 min read · 42 views

arXiv cs.AI

When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning

The paper explores the impact of multi-agent debate on data cleaning processes. It finds that while debate can lead to…

6/3/2026 · 3 min read · 40 views

arXiv cs.AI

Don't Gamble, GAMBLe: An Analytical Framework for AI-Driven Research Systems

The paper titled 'Don't Gamble, GAMBLe' introduces a framework for analyzing AI-Driven Research Systems (ADRS). It…

6/3/2026 · 3 min read · 41 views

arXiv cs.AI

Toward a Modular Architecture for Embedded AI Agent Systems at the Edge

The paper discusses a modular architecture for embedded AI agent systems designed to operate within the constraints of…

6/3/2026 · 2 min read · 50 views

arXiv cs.AI

Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models

The paper evaluates the phenomenon of harmful overthinking in Large Reasoning Models (LRMs). It introduces a new…

6/3/2026 · 3 min read · 44 views

arXiv cs.AI

An Exploration of Collision-based Enemy Morphology Generation

The paper explores novel methods for generating enemy morphologies in video games using player collision information.…

6/3/2026 · 2 min read · 41 views

arXiv cs.AI

Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection

Traj-Evolve is a self-evolving multi-agent system designed for modeling patient trajectories in lung cancer early…

6/3/2026 · 3 min read · 48 views

arXiv cs.AI

ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning

ChatHealthAI is a proposed multimodal reasoning framework that aligns electronic health record (EHR) representations…

6/3/2026 · 2 min read · 39 views

arXiv cs.AI

BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces

The paper introduces BehaviorBench, a benchmark designed to evaluate personalized decision modeling using real-world…

6/3/2026 · 3 min read · 44 views

arXiv cs.AI

Evaluating Transformer and LSTM Frameworks for Prediction in Ungauged Basins

This study evaluates the effectiveness of Transformer and LSTM frameworks for predicting streamflow in ungauged…

6/3/2026 · 2 min read · 47 views

arXiv cs.AI

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

The paper presents AURA-Mem, a novel memory architecture designed for robotic policies that operates with constant…

6/3/2026 · 3 min read · 41 views

arXiv cs.AI

Visual Graph Scaffolds for Structural Reasoning in Large Language Models

The paper discusses the use of visual graph scaffolds to enhance structural reasoning in large language models (LLMs).…

6/3/2026 · 3 min read · 43 views

Google News

How AI is Transforming Scientific Discovery While Keeping Humans at the Center - Stanford HAI

Comprehensive up-to-date news coverage, aggregated from sources all over the world by Google News.

5/27/2026 · 40 views

arXiv cs.AI

SetupX: Can LLM Agents Learn from Past Failures in Functionality-Correct Code Repository Setup?

The paper introduces SetupX, a framework designed to improve the setup of functionality-correct code repositories by…

5/27/2026 · 3 min read · 36 views

arXiv cs.AI

GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training

The paper presents GAC, a noise-aware adaptive mixing method for hybrid post-training in machine learning. This…

5/27/2026 · 2 min read · 31 views

arXiv cs.AI

AutoDFT: A Closed-Loop Multi-Agent Framework for Autonomous DFT Calculations

AutoDFT is a new multi-agent framework designed to enhance autonomous DFT calculations in materials science. It…

5/27/2026 · 3 min read · 36 views

arXiv cs.AI

RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations

RepoMirage is a new evaluation suite designed to assess repository context reasoning in code agents. The study reveals…

5/27/2026 · 3 min read · 33 views

arXiv cs.AI

PitchBench: Measuring Pitch Hearing in Audio-Language Models

The article introduces PitchBench, a new evaluation suite designed to measure pitch hearing in audio-language models…

5/27/2026 · 3 min read · 34 views

arXiv cs.AI

InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

The paper titled 'InfoQuant' addresses the challenges of low-bit activation quantization in large language models. It…

5/27/2026 · 3 min read · 35 views

arXiv cs.AI

A Universal Cliff and a Design Fingerprint: Cross-Section Defect Detection Under LLM Orchestration

The paper discusses the challenges of detecting cross-section defects in documents processed by language model…

5/27/2026 · 3 min read · 33 views

arXiv cs.AI

Planning Neural Dynamics with Lie Group Embedding through Supervised Projective Manifold Learning

The article presents a novel approach to neural dynamics using Lie group embedding through supervised projective…

5/27/2026 · 3 min read · 35 views

arXiv cs.AI

Enhancing Autonomous Online Intrusion Detection for IoT with Balanced Learning, Reliable Pseudo-Labels, and Lightweight Architectures

The paper discusses advancements in autonomous online intrusion detection systems (IDS) for IoT devices. It highlights…

5/27/2026 · 3 min read · 31 views

arXiv cs.AI

Tool-Schema Compression Enables Agentic RAG Under Constrained Context Budgets

The paper discusses the challenges faced by agentic RAG systems due to tool schemas consuming context windows needed…

5/27/2026 · 3 min read · 34 views

arXiv cs.AI

On the Push-Based Asynchronous Federated Learning: A Bias-Correction Aggregation Approach

The article discusses a new framework called PushCen-ADFL for asynchronous decentralized federated learning. This…

5/27/2026 · 3 min read · 31 views

arXiv cs.AI

TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models

The paper titled 'TSFMAudit' addresses the issue of data contamination in time series foundation models (TSFMs). It…

5/27/2026 · 3 min read · 32 views

arXiv cs.AI

Furina: Fragmented Uncertainty-Driven Refusal Instability Attack

The paper titled 'Furina: Fragmented Uncertainty-Driven Refusal Instability Attack' explores safety alignment in large…

5/27/2026 · 2 min read · 37 views

arXiv cs.AI

Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges

The paper introduces BITE, a framework designed to exploit stylistic biases in LLM judges. It demonstrates that these…

5/27/2026 · 3 min read · 40 views

arXiv cs.AI

When Does Adaptive Guidance Help? Belief-Aware Privileged Distillation for Autonomous Driving Under Partial Observability

The paper introduces Belief-Aware GSAC (BA-GSAC), which adapts the distillation coefficient in autonomous driving…

5/27/2026 · 3 min read · 37 views

arXiv cs.AI

MemMorph: Tool Hijacking in LLM Agents via Memory Poisoning

The paper introduces MemMorph, a novel attack method targeting long-term memory in LLM-driven agents. By injecting…

5/27/2026 · 3 min read · 29 views

arXiv cs.AI

Augment Engineering: A Methodology for Multi-Tool AI Orchestration Across Professional Domains

The paper titled 'Augment Engineering' introduces a methodology for orchestrating multiple AI tools across various…

5/27/2026 · 3 min read · 38 views

arXiv cs.AI

VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents

The article introduces VISTA, a benchmark designed to evaluate the capabilities of LLM-based agents in generating web…

5/27/2026 · 3 min read · 29 views

arXiv cs.AI

AssetGen: Deployable 3D Asset Generation at Interactive Speed

AssetGen is a new 3D asset generation system that prioritizes user experience and deployability. It can produce…

5/27/2026 · 3 min read · 35 views

arXiv cs.AI

Eroding Trust in Real Speech: A Large-Scale Study of Human Audio Deepfake Perception

A recent study investigates the impact of audio deepfakes on human trust in real speech. The research, which involved…

5/27/2026 · 3 min read · 32 views

arXiv cs.AI

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

The paper discusses Pretraining Data Exposure (PDE) in Large Language Models (LLMs), highlighting its implications for…

5/27/2026 · 2 min read · 33 views

arXiv cs.AI

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation

The paper introduces GEM, a framework designed for optimal data curation in large language models (LLMs). It addresses…

5/27/2026 · 2 min read · 37 views

arXiv cs.AI

Edge AI Deployment Beyond Models: A BSP-Aware Systems Framework for Industrial Embedded Platforms

The paper presents a framework for deploying Edge AI in industrial embedded platforms, emphasizing the importance of a…

5/27/2026 · 3 min read · 26 views

arXiv cs.AI

Xe-Forge: Multi-Stage LLM-Powered Kernel Optimization for Intel GPU

Xe-Forge is a new multi-stage pipeline designed to optimize kernel performance for Intel GPUs. It automates the…

5/27/2026 · 3 min read · 21 views

arXiv cs.AI

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

The MUSE-Autoskill framework introduces a new approach for self-evolving agents that enhances their ability to create…

5/27/2026 · 3 min read · 32 views

arXiv cs.AI

Natural Language Query to Configuration for Retrieval Agents

The paper presents a new approach called BRANE for optimizing retrieval agent configurations based on natural language…

5/27/2026 · 3 min read · 32 views

arXiv cs.AI

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

The paper discusses a vulnerability known as alignment tampering in Reinforcement Learning from Human Feedback (RLHF).…

5/27/2026 · 3 min read · 39 views

arXiv cs.AI

2-ASP(Q) programs with weak constraints: Complexity and efficient implementation

The paper discusses 2-ASP(Q) programs with weak constraints, a significant area in Answer Set Programming. It provides…

5/27/2026 · 2 min read · 32 views

arXiv cs.AI

Maat: The Agentic Legal Research Assistant for Competition Protection

Maat is a new legal research assistant designed specifically for competition law analysis. It outperforms existing…

5/27/2026 · 3 min read · 23 views

arXiv cs.AI

Modeling Agentic Technical Debt and Stochastic Tax: A Standalone Framework for Measurement, Simulation, and Dashboarding

The article presents a framework for measuring and simulating Agentic Technical Debt and Stochastic Tax in AI systems.…

5/27/2026 · 3 min read · 42 views

arXiv cs.AI

SIA: Self Improving AI with Harness & Weight Updates

The paper presents SIA, a self-improving AI framework that integrates harness and weight updates. It aims to overcome…

5/27/2026 · 3 min read · 36 views

arXiv cs.AI

Gumbel Machine: Counterfactual Student Writing Generation via Gumbel Noise Steering

The Gumbel Machine is a new approach to generating counterfactual student writing that aims to improve educational…

5/27/2026 · 2 min read · 35 views

arXiv cs.AI

Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments

The paper discusses the development of NoisyAgent, a training framework aimed at enhancing the robustness of agents in…

5/27/2026 · 3 min read · 34 views

Ai Research news.

Inducing Reasoning Primitives from Agent Traces

WISE-HAR: A Generalizable Ensemble Deep Learning Framework for WiFi-Based Human Activity Recognition

What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents

Large AI Models in Dental Healthcare: From General-Purpose Systems to Domain-Specific Foundation Models

Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks

When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning

Don't Gamble, GAMBLe: An Analytical Framework for AI-Driven Research Systems

Toward a Modular Architecture for Embedded AI Agent Systems at the Edge

Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models

An Exploration of Collision-based Enemy Morphology Generation

Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection

ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning

BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces

Evaluating Transformer and LSTM Frameworks for Prediction in Ungauged Basins

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

Visual Graph Scaffolds for Structural Reasoning in Large Language Models

How AI is Transforming Scientific Discovery While Keeping Humans at the Center - Stanford HAI

SetupX: Can LLM Agents Learn from Past Failures in Functionality-Correct Code Repository Setup?

GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training

AutoDFT: A Closed-Loop Multi-Agent Framework for Autonomous DFT Calculations

RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations

PitchBench: Measuring Pitch Hearing in Audio-Language Models

InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

A Universal Cliff and a Design Fingerprint: Cross-Section Defect Detection Under LLM Orchestration

Planning Neural Dynamics with Lie Group Embedding through Supervised Projective Manifold Learning

Enhancing Autonomous Online Intrusion Detection for IoT with Balanced Learning, Reliable Pseudo-Labels, and Lightweight Architectures

Tool-Schema Compression Enables Agentic RAG Under Constrained Context Budgets

On the Push-Based Asynchronous Federated Learning: A Bias-Correction Aggregation Approach

TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models

Furina: Fragmented Uncertainty-Driven Refusal Instability Attack

Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges

When Does Adaptive Guidance Help? Belief-Aware Privileged Distillation for Autonomous Driving Under Partial Observability

MemMorph: Tool Hijacking in LLM Agents via Memory Poisoning

Augment Engineering: A Methodology for Multi-Tool AI Orchestration Across Professional Domains

VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents

AssetGen: Deployable 3D Asset Generation at Interactive Speed

Eroding Trust in Real Speech: A Large-Scale Study of Human Audio Deepfake Perception

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation

Edge AI Deployment Beyond Models: A BSP-Aware Systems Framework for Industrial Embedded Platforms

Xe-Forge: Multi-Stage LLM-Powered Kernel Optimization for Intel GPU

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Natural Language Query to Configuration for Retrieval Agents

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

2-ASP(Q) programs with weak constraints: Complexity and efficient implementation

Maat: The Agentic Legal Research Assistant for Competition Protection

Modeling Agentic Technical Debt and Stochastic Tax: A Standalone Framework for Measurement, Simulation, and Dashboarding

SIA: Self Improving AI with Harness & Weight Updates

Gumbel Machine: Counterfactual Student Writing Generation via Gumbel Noise Steering

Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments

Sources in Ai Research

Other categories