WeSearch

Still: Amortized KV Cache Compaction in a Single Forward Pass

·3 min read · 0 reactions · 0 comments · 2 views
#machine‑learning#natural‑language‑processing#model‑compression#inference#Charles O'Neill#Alex Sandomirsky#Harry Partridge#Mudith Jayasekara#Max Kirkby#Qwen#Gemma
Still: Amortized KV Cache Compaction in a Single Forward Pass
⚡ TL;DR · AI summary

The paper presents Still, a lightweight per-layer Perceiver that compacts KV caches in a single forward pass for long‑horizon language model inference. It demonstrates superior speed‑quality trade‑offs across a range of compression ratios and context lengths on models such as Qwen and Gemma. The method also improves summarization performance, surpassing strong baselines like KV‑Distill on benchmarks including RULER and LongBench.

Key facts
Original article
arXiv.org
Read full at arXiv.org →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Machine Learning arXiv:2606.07878 (cs) [Submitted on 5 Jun 2026] Title:Still: Amortized KV Cache Compaction in a Single Forward Pass Authors:Charles O'Neill, Alex Sandomirsky, Harry Partridge, Mudith Jayasekara, Max Kirkby View a PDF of the paper titled Still: Amortized KV Cache Compaction in a Single Forward Pass, by Charles O'Neill and 4 other authors View PDF HTML (experimental) Abstract:The KV cache is the memory bottleneck of long-horizon language model deployment. Practically, a deployable compactor must be lightweight enough to call during inference, expressive enough to preserve context under constraint, and reusable across a trajectory.

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv.org.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv.org