Hub / Ai Research
ai-research · WeSearch
Ai Research news.
The latest AI and machine-learning research — new papers, model architectures, transformers, reinforcement learning, benchmarks, and lab announcements.
ARXIV CS.AI
Visual Graph Scaffolds for Structural Reasoning in Large Language Models
ARXIV CS.AI
AURA: Action-Gated Memory for Robot Policies at Constant VRAM
ARXIV CS.AI
Evaluating Transformer and LSTM Frameworks for Prediction in Ungauged Basins
ARXIV CS.AI
BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces
ARXIV CS.AI
ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning
ARXIV CS.AI
Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection
ARXIV CS.AI
An Exploration of Collision-based Enemy Morphology Generation
ARXIV CS.AI
Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models
ARXIV CS.AI
Toward a Modular Architecture for Embedded AI Agent Systems at the Edge
ARXIV CS.AI
Don't Gamble, GAMBLe: An Analytical Framework for AI-Driven Research Systems
ARXIV CS.AI
When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning
ARXIV CS.AI
Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks
ARXIV CS.AI
Large AI Models in Dental Healthcare: From General-Purpose Systems to Domain-Specific Foundation Models
ARXIV CS.AI
What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents
ARXIV CS.AI
WISE-HAR: A Generalizable Ensemble Deep Learning Framework for WiFi-Based Human Activity Recognition
ARXIV CS.AI
Inducing Reasoning Primitives from Agent Traces
ARXIV CS.AI
AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification
ARXIV CS.AI
TriEval: A Resource-Efficient Pipeline for LLM Bias, Toxicity, and Truthfulness Assessment
ARXIV CS.AI
RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases
ARXIV CS.AI
ToolGate: Token-Efficient Pre-Call Control for Tool-Augmented Vision-Language Agents
ARXIV CS.AI
SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale
ARXIV CS.AI
CORE: Conflict-Oriented Reasoning for General Multimodal Manipulation Detection
ARXIV CS.AI
DELTAMEM: Incremental Experience Memory for LLM Agents via Residual Trees
ARXIV CS.AI
The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs
ARXIV CS.AI
Decomposing how prompting steers behavior
ARXIV CS.AI
From Long News to Accurate Forecast: Importance-Aware Fusion and PRM-Guided Reflection for Time Series Forecasting
ARXIV CS.AI
DeskCraft: Benchmarking Desktop Agents on Professional Workflows and Human-in-the-Loop Collaboration
ARXIV CS.AI
EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning
ARXIV CS.AI
Uncertainty-Aware Clarification in LLM Agents with Information Gain
ARXIV CS.AI
Think-Before-Speak: From Internal Evaluation to Public Expression in Multi-Agent Social Simulation
ARXIV CS.AI
GTBench: A Curriculum-Grounded Benchmark for Evaluating LLMs as Mathematical Research Assistants in Graph Theory
ARXIV CS.AI
ClinicalMC: A Benchmark for Multi-Course Clinical Decision-Making with Large Language Models
ARXIV CS.AI
MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents
ARXIV CS.AI
Effect of Demographic Bias on Skin Lesion Classification
ARXIV CS.AI
Perceive Before Reasoning: A Pre-Reasoning Perception Framework for Efficient and Reliable Proactive Mobile Agents
ARXIV CS.AI
Solipsistic Superintelligence is Unlikely to be Cooperative
ARXIV CS.AI
Do Real-World Datasets Contain Natural Experiments? An Empirical Study Using Causal Feature Selection
ARXIV CS.AI
Distilling Answer-Set Programming Rules from LLMs for Neurosymbolic Visual Question Answering
ARXIV CS.AI
A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting
ARXIV CS.AI
LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks
ARXIV CS.AI
The Reliability Gap in Benchmark Auditing: Distribution Shift and Scale as Failure Modes of Contamination Detection
ARXIV CS.AI
The Violation Situation Pattern: A Knowledge-Graph Pattern for Compliance Violations
ARXIV CS.AI
InfoMem: Training Long-Context Memory Agents with Answer-Conditioned Information Gain
ARXIV CS.AI
CP-Agent: Context-Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations
ARXIV CS.AI
What Makes Interaction Trajectories Effective for Training Terminal Agents?
ARXIV CS.AI
DMF: A Deterministic Memory Framework for Conversational AI Agents
ARXIV CS.AI
StepFinder: A Temporal Semantic Framework for Failure Attribution in Multi-Agent Systems
ARXIV CS.AI
A formal definition and meta-model for a machine theory of mind
ARXIV CS.AI
ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning
ARXIV CS.AI