WeSearch

📄Paper: RORA-VLM: Robust Retrieval Augmentation for Vision Language Models

·1 min read · 0 reactions · 0 comments · 13 views
#ai#vlm#research#machine learning#iclr
📄Paper: RORA-VLM: Robust Retrieval Augmentation for Vision Language Models
⚡ TL;DR · AI summary

The paper titled 'RORA-VLM: Robust Retrieval Augmentation for Vision Language Models' was presented at ICLR 2025 but was unfortunately rejected. It proposes a framework that enhances Vision Language Models (VLM) by integrating external knowledge retrieval to improve question answering. The approach includes a two-stage retrieval process and noise-resilient training to ensure stable reasoning despite potential inaccuracies in retrieved information.

Key facts
Original article
DEV.to (Top)
Read full at DEV.to (Top) →
Opening excerpt (first ~120 words) tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3189362) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Mercy Posted on May 29 📄Paper: RORA-VLM: Robust Retrieval Augmentation for Vision Language Models #ai #vlm #rag #paper Public At International Conference on Learning Representations (ICLR) 2025 💡 Why I read this 最近在找論文的 idea 剛好找到這篇,發表在 ICLR 2025,不過被 Reject 了有點可惜 這篇主要是把 RAG 應用到 VLM ,讓模型在回答問題時可以利用外部知識 在很多 VQA 的任務中,答案其實不在圖片裡面,而是需要額外的背景知識 例如一張圖顯示一種鳥,問題是:「這種鳥主要分布在哪裡?」 圖片只能讓你看出鳥長什麼樣,但像棲地這種資訊一定要查資料才知道 這篇主要在解決:「當 retrieved knowledge 有 noise 時,VLM 怎麼還能穩定推理? 🧠 Core idea 作者提出一個 robust…

Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from DEV.to (Top)