Search: "vision language models"

8 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

8 results for "vision language models"

A systematic evaluation of vision-language models for observational astronomical reasoning tasks

Vision-language models (VLMs) are increasingly proposed as general-purpose tools for scientific data interpretation, yet their reliability on real astronomical observations across diverse modalities r…

Tue, 28 Apr 2026 04:13:21 GMT · 4 views

NVIDIA BLOG

NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents

AI agent systems today juggle separate models for vision, speech and language — losing time and context as they pass data from one model to the other. Unveiled today, NVIDIA Nemotron 3 Nano Omni is an…

Tue, 28 Apr 2026 16:28:47 GMT · 0 views

ARXIV.ORG

PhysNote: Self-Knowledge Notes for Evolvable Physical Reasoning in Vision-Language Model

Vision-Language Models (VLMs) have demonstrated strong performance on textbook-style physics problems, yet they frequently fail when confronted with dynamic real-world scenarios that require temporal …

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Thinking Like a Clinician: A Cognitive AI Agent for Clinical Diagnosis via Panoramic Profiling and Adversarial Debate

The application of large language models (LLMs) in clinical decision support faces significant challenges of "tunnel vision" and diagnostic hallucinations present in their processing unstructured elec…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

In recent years, the integration of multimodal machine learning in wellbeing assessment has offered transformative potential for monitoring mental health. However, with the rapid advancement of Vision…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Grounding Before Generalizing: How AI Differs from Humans in Causal Transfer

Extracting abstract causal structures and applying them to novel situations is a hallmark of human intelligence. While Large Language Models (LLMs) and Vision Language Models (VLMs) have shown strong …

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Aligning with Your Own Voice: Self-Corrected Preference Learning for Hallucination Mitigation in LVLMs

Large Vision-Language Models (LVLMs) frequently suffer from hallucinations. Existing preference learning-based approaches largely rely on proprietary models to construct preference datasets. We identi…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations

Driving in compliance with traffic laws and regulations is a basic requirement for human drivers, yet autonomous vehicles (AVs) can violate these requirements in diverse real-world scenarios. To encod…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

Or browse by topic

World US Politics Technology AI Markets Business Science Climate Health Culture Media

Results for "vision language models".

A systematic evaluation of vision-language models for observational astronomical reasoning tasks

NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents

PhysNote: Self-Knowledge Notes for Evolvable Physical Reasoning in Vision-Language Model

Thinking Like a Clinician: A Cognitive AI Agent for Clinical Diagnosis via Panoramic Profiling and Adversarial Debate

FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

Grounding Before Generalizing: How AI Differs from Humans in Causal Transfer

Aligning with Your Own Voice: Self-Corrected Preference Learning for Hallucination Mitigation in LVLMs

Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations

Or browse by topic