WeSearch
Hub / Tags / Language Model
TAG · #LANGUAGE-MODEL

Language Model coverage.

Every story in the WeSearch catalog tagged with #language-model, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

60 stories tagged with #language-model, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag →   or   search "Language Model"

RELATED TAGS
#language-models124#ai123#ml76#technology6#large-language-models6#reinforcement-learning4#research4#ai-research3#computation3#anthropic2#document-editing2#reasoning2
ARXIV.ORG

GPU Forecasters: Language Models as Selective Surrogates for Kernel Optimization

GPU kernels are the workhorse of modern deep learning, and optimizing them (via evolutionary search or coding agents) usually requires repeated measurement on target hardware. Whil…

11 views ·
#machine learning#artificial intelligence#gpu optimization
ARXIV.ORG

Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation

As large language models (LLMs) are increasingly used for long-form generation, reliably evaluating long-form outputs has become a critical challenge. LLM-as-a-judge offers a scala…

17 views ·
#machine learning#language models#evaluation
ARXIV CS.AI

Visual Graph Scaffolds for Structural Reasoning in Large Language Models

Graphs have been used to enhance large language models (LLMs) for structured reasoning, mostly as external knowledge sources are provided to models at test time. In this paper, we …

14 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning

Large language models (LLMs) exhibit strong natural-language reasoning abilities for clinical decision support, but struggle to effectively model structured longitudinal electronic…

11 views ·
#artificial intelligence#healthcare#machine learning
ARXIV CS.AI

SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale

As LLM agents adopt large skill libraries, selecting the right subset becomes a structural problem rather than a similarity-matching one: skills depend on, conflict with, specializ…

10 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs

Inference-time scaling has emerged as a critical avenue for enhancing Large Language Models' performance, yet real-world deployment is constrained by strict computational budgets. …

11 views ·
#artificial intelligence#budget allocation#large language models
ARXIV CS.AI

Decomposing how prompting steers behavior

Prompting steers large language models (LLMs) and vision-language models (VLMs) without weight updates, but it remains unclear how instruction changes reshape internal representati…

14 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

Uncertainty-Aware Clarification in LLM Agents with Information Gain

Large Language Model (LLM) agents often operate under underspecified user instructions, where latent uncertainty over user intent leads to erroneous tool actions. To address this c…

14 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

ClinicalMC: A Benchmark for Multi-Course Clinical Decision-Making with Large Language Models

Large language models (LLMs) have been widely adopted in healthcare, yet they still encounter significant challenges in complex clinical decision-making scenarios. Existing benchma…

9 views ·
#healthcare#artificial intelligence#clinical decision-making
ARXIV CS.AI

From Answers to States: Verifiable Process-Level Evaluation of Chemical Reasoning in Large Language Models

Large language models are increasingly used as chemistry assistants, yet most chemistry benchmarks still score only final answers. This masks a critical failure mode: a model may o…

10 views ·
#artificial intelligence#chemistry#machine learning
ARXIV CS.AI

Code-on-Graph: Iterative Programmatic Reasoning via Large Language Models on Knowledge Graphs

Knowledge Graphs (KGs) are widely used to mitigate the limitations of Large Language Models (LLMs), such as outdated knowledge and hallucinations. Existing LLM-KG integration frame…

13 views ·
#artificial intelligence#knowledge graphs#language models
IEEE SPECTRUM

Why Are Large Language Models So Terrible at Video Games?

LLMs can code your retro shooter but still fail at playing Halo; see what this gap reveals about AI’s real limits in 2026…

14 views ·
#llms#artificial-intelligence#video-games
R/LOCALLLAMA

Parallax: Parameterized Local Linear Attention for Language Modeling

16 views ·
ARXIV.ORG

Scaling Laws for Agent Harnesses via Effective Feedback Compute

Agent harnesses increasingly determine the performance of language-model systems by deciding how models call tools, receive feedback, verify intermediate states, store memory, and …

16 views ·
#computer science#language models#feedback
R/PROMPTENGINEERING

Heuristic Parasites: A Behavioral Taxonomy of Recurrent Distortion Patterns in Large Language Models (Full System) V2

16 views ·
ARXIV.ORG

AI Propaganda factories with language models

AI-powered influence operations can now be executed end-to-end on commodity hardware. We show that small language models produce coherent, persona-driven political messaging and ca…

15 views ·
#artificial intelligence#cryptography#security
DEV.TO (TOP)

✨📊 🧠 The Ultimate Visual Guide to Large Language Models (LLMs)

Generative AI is a type of artificial intelligence that can produce new content including text,...…

10 views ·
#artificial intelligence#machine learning#language models
DEV.TO (TOP)

📄Paper: RORA-VLM: Robust Retrieval Augmentation for Vision Language Models

Public At International Conference on Learning Representations (ICLR) 2025 💡 Why I read...…

16 views ·
#ai#vlm#research
ARS TECHNICA - ALL CONTENT

LLMs believe false statements even after explicit warnings that they're false

Fine-tuning tests show "bias ... toward confidently representing the claims as true."…

13 views ·
#artificial intelligence#research#language models
UNITE.AI

Why does AI love writing about lighthouse keepers?

Asked to 'write a story', ChatGPT and other leading language models appear to be avoiding copyright infringement by obsessive recourse to the same small and strange cast of lightho…

12 views ·
#artificial intelligence#language models#storytelling
ARXIV.ORG

How sure is the activation oracle?

Activation oracles aim to make the activations of other models legible to humans and yield promising results compared to white-box interpretability techniques. However, uncertainty…

12 views ·
#artificial intelligence#language models#interpretability
ARXIV CS.AI

Can LLMs Introspect? A Reality Check

Can large language models detect and report their own internal states? A number of studies have argued that the answer to this question is yes. We argue, based on lessons from huma…

18 views ·
#artificial intelligence#language models#metacognition
ARXIV CS.AI

Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions

Multimodal large language model (MLLM)-based embodied agents have shown strong potential for solving complex tasks in physical environments. However, personalized assistance requir…

22 views ·
#artificial intelligence#machine learning#personalization
ARXIV CS.AI

OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

Theory of Mind (ToM), the ability to infer others' knowledge, intentions, and emotions, is commonly evaluated in large language models (LLMs) using end-point question answering, wh…

17 views ·
#artificial intelligence#language models#theory of mind
ARXIV CS.AI

Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions

Large Language Models (LLMs) achieve impressive accuracy on mathematical reasoning benchmarks, yet their performance drops when problems are modified with simple changes like diffe…

18 views ·
#artificial intelligence#machine learning#mathematics
ARXIV CS.AI

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

We introduce the MiniMax-M2 series, a family of Mixture-of-Experts language models built around the principle that mini activations can unleash maximum real-world intelligence. The…

18 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

MedGuideX: Internalizing Decision Logic from Executable Guidelines into Large Language Models for Clinical Reasoning

Clinical practice guidelines (CPGs) encode evidence-based decision logic that clinicians apply by evaluating patient variables, conditional criteria, and recommendation rules. Howe…

17 views ·
#artificial intelligence#healthcare#clinical reasoning
ARXIV CS.AI

AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

The token-level extractive compressors widely used for general LM context are structurally inappropriate for LLM agents: across 17 (env, backbone, method) cells spanning two indepe…

12 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

Large language models (LLMs) have shown strong empirical gains as self-evolving agents for CUDA kernel generation, driven by feedback-conditioned planning across generations. Howev…

20 views ·
#artificial intelligence#machine learning#cuda
ARXIV CS.AI

The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context

Retrieval-augmented generation promises to ground language model outputs in external evidence, yet the field has no reliable way to verify whether retrieved context actually govern…

12 views ·
#artificial intelligence#language models#machine learning
ARXIV CS.AI

What Makes Chain-of-Thought Work at Probe Time? Local Co-occurrence Rather Than Global Derivation

Chain-of-thought (CoT) prompting reliably improves language-model accuracy, but which properties of a rationale text drive the improvement is poorly understood. Prior work has larg…

12 views ·
#artificial intelligence#language models#research
ARXIV CS.AI

Multi-Stakeholder LLM Alignment: Decomposing Estimation from Aggregation

Multi-stakeholder tasks require one output to satisfy users with conflicting preferences. Holistic LLM judges conflate utility estimation and utility aggregation, yielding unstable…

14 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

Generating Robust Portfolios of Optimization Models using Large Language Models

Mathematical optimization is a powerful tool for structured decision-making across domains such as resource allocation and planning. Formulating optimization models faithful to rea…

13 views ·
#artificial intelligence#optimization#language models
ARXIV CS.AI

Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation

Domain specialization can improve LLM behavior in vertical domains, but often weakens the general capabilities inherited from the original model. Recent Multi-Teacher On-Policy Dis…

18 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs

Retrieval-augmented LLMs are deployed for tasks where evidence quality determines action safety, yet evaluation protocols assume that single-turn robustness predicts robustness whe…

12 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

Gumbel Machine: Counterfactual Student Writing Generation via Gumbel Noise Steering

An effective method of teaching across disciplines is to provide examples of high-quality work. However, an example may be significantly different from a student's current work, ma…

16 views ·
#artificial intelligence#education#writing
ARXIV CS.AI

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

Large Language Models (LLMs) have become the predominant paradigm in NLP, advancing both research and industry. As model sizes and pretraining data grow, concerns about Pretraining…

10 views ·
#artificial intelligence#machine learning#data privacy
ARXIV CS.AI

PitchBench: Measuring Pitch Hearing in Audio-Language Models

Audio-language models (ALMs) are increasingly used in real-world applications that require understanding music, from music tutoring and transcription to captioning, recommendation …

18 views ·
#audio#artificial intelligence#music
MICROSOFT RESEARCH

Microsoft Research: LLMs Corrupt your files during delegated work

Large Language Models (LLMs) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). Delegation requires trust…

13 views ·
#artificial intelligence#language models#document editing
LET'S DATA SCIENCE

Sparse Autoencoders Reveal Cortical Brain-LLM Semantic Mapping

A preprint submitted to arXiv (arXiv:2605.23035) by Dongxin Guo and colleagues presents a mechanistic interpretability approach connecting large language model representations to h…

18 views ·
#neuroscience#machine learning#language models
ARXIV.ORG

Prompt Politeness Affects LLM Accuracy

The wording of natural language prompts has been shown to influence the performance of large language models (LLMs), yet the role of politeness and tone remains underexplored. In t…

15 views ·
#artificial intelligence#language models#research
SMOLA

You don't need all the LLM benchmarks

12 views ·
#machine learning#benchmarks#language models
ARXIV CS.AI

In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models

We are in the midst of large-scale industrial and academic efforts to automate the processes of scientific, technological and creative production through AI-driven assistants. Hist…

18 views ·
#artificial intelligence#computer vision#neural networks
ARXIV CS.AI

Confidence Calibration in Large Language Models

We investigate the calibration of large language models' (LLMs') confidence across diverse tasks. The results of our preregistered study show that the current crop of LLMs are, lik…

16 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning

Reasoning-capable large language models solve hard problems by emitting long chains of thought, paying heavily in latency, GPU time, and energy. Casual inspection of their traces r…

14 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform

Large language models achieve strong performance in language generation and knowledge-intensive tasks, yet remain limited in settings requiring causal reasoning, persistent state t…

13 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition

The evolution of Large Language Model (LLM) reasoning is bottlenecked by the scarcity of high-quality process data. While self-alignment via endogenous rewards offers a solution, m…

23 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models

Large Language Models (LLMs) are predominantly governed by probabilistic frameworks in which the sum of outcome probabilities is constrained to unity. This architectural limitation…

18 views ·
#artificial intelligence#machine learning#neutrosophic logic
ARXIV CS.AI

HyperGuide: Hyperbolic Guidance for Efficient Multi-Step Reasoning in Large Language Models

Multi-step reasoning remains a central challenge for large language models: single-pass generation is efficient but lacks accuracy; tree-search methods explore multiple paths but a…

11 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

Inference Time Context Sparsity: Illusion or Opportunity?

Sparsity has long been a central theme in LLM efficiency, but its role in context processing remains unresolved. As LLM workloads shift toward longer contexts and agentic interacti…

10 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

Distilling Game Code World Model Generation into Lightweight Large Language Models

Large Language Models (LLMs) have shown great ability in generating executable code from natural language, opening the possibility of automatically constructing environments for AI…

17 views ·
#artificial intelligence#machine learning#game development
ARXIV CS.AI

Understanding and Mitigating Premature Confidence for Better LLM Reasoning

Long chains of thought (CoT) from current language models frequently contain logical gaps and unjustified leaps, limiting the gains from additional test-time compute. Improving rea…

13 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

Hypothesis Generation and Inductive Inference in Children and Language Models

Real world decision-making requires constructing mental models under uncertainty over evidence, over the underlying causal rules, and over the state of the world itself. Which comp…

13 views ·
#artificial intelligence#machine learning#cognitive science
ARXIV CS.AI

PALoRA: Projection-Adaptive LoRA for Preserving Reasoning in Large Language Models

Efficiently updating Large Language Models (LLMs) with new or evolving factual knowledge remains a central challenge, as even parameter-efficient adaptation can erode previously ac…

10 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models

Fine-tuning-as-a-Service (FaaS) enables personalization of large language models (LLMs), but it can weaken safety-alignment under harmful fine-tuning attacks. Recent work has shown…

13 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

Summoning the Oracle to Slay It: Mitigating Look-Ahead Bias in Financial Backtesting with Large Language Models

Backtesting large language models (LLMs) on historical financial data is unreliable because pre-training cuts off after the events happened. An LLM trained in 2024 already "knows" …

17 views ·
#artificial intelligence#finance#machine learning
ARXIV CS.AI

Learning to Reason Efficiently with A* Post-Training

Many applications of large language models (LLMs) require deductive reasoning, yet models frequently produce incorrect or redundant inference steps. We frame natural language infer…

10 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

When Mean CE Fails: Median CE Can Better Track Language Model Quality

Mean cross-entropy is the standard validation metric for language models, but it can fail to track model quality during training. We examine this in two common scenarios. First, in…

9 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

Emotional intelligence in large language models is fragmented across perception, cognition, and interaction

As large language models (LLMs) are increasingly integrated into emotionally sensitive domains, the structural integrity of their emotional intelligence (EI) becomes a critical fro…

19 views ·
#artificial intelligence#emotional intelligence#language models
ARXIV CS.AI

Automated Detection and Classification of Delusion-related Content in Naturalistic Audio Diaries Using Multi-Agent Language Models

Speech monologues recorded in naturalistic settings provide opportunities to characterize mental illness phenomenology and detect symptom exacerbation. Large language models (LLMs)…

9 views ·
#artificial intelligence#mental health#language models