📄Paper: RORA-VLM: Robust Retrieval Augmentation for Vision Language Models

May 29, 2026 · 4:13 AM UTC ·1 min read · 0 reactions · 0 comments · 30 views

#ai #vlm #research #machine learning #iclr

📄Paper: RORA-VLM: Robust Retrieval Augmentation for Vision Language Models

TL;DR · WeSearch summary

The paper titled 'RORA-VLM: Robust Retrieval Augmentation for Vision Language Models' was presented at ICLR 2025 but was unfortunately rejected. It proposes a framework that enhances Vision Language Models (VLM) by integrating external knowledge retrieval to improve question answering. The approach includes a two-stage retrieval process and noise-resilient training to ensure stable reasoning despite potential inaccuracies in retrieved information.

Key facts

▪The paper introduces a robust retrieval framework for Vision Language Models.
▪It employs a two-stage retrieval process to enhance the model's ability to answer questions using external knowledge.
▪The training method includes intentionally introducing noise to help the model learn to ignore irrelevant information.

Original article

DEV.to (Top)

Read full at DEV.to (Top) →

Opening excerpt (first ~120 words) tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3189362) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Mercy Posted on May 29 📄Paper: RORA-VLM: Robust Retrieval Augmentation for Vision Language Models #ai #vlm #rag #paper Public At International Conference on Learning Representations (ICLR) 2025 💡 Why I read this 最近在找論文的 idea 剛好找到這篇，發表在 ICLR 2025，不過被 Reject 了有點可惜這篇主要是把 RAG 應用到 VLM ，讓模型在回答問題時可以利用外部知識在很多 VQA 的任務中，答案其實不在圖片裡面，而是需要額外的背景知識例如一張圖顯示一種鳥，問題是：「這種鳥主要分布在哪裡？」圖片只能讓你看出鳥長什麼樣，但像棲地這種資訊一定要查資料才知道這篇主要在解決：「當 retrieved knowledge 有 noise 時，VLM 怎麼還能穩定推理？ 🧠 Core idea 作者提出一個 robust…

Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).

Anonymous · no account needed

Discussion

0 comments

📄Paper: RORA-VLM: Robust Retrieval Augmentation for Vision Language Models

Discussion

More from DEV.to (Top)