Search: "hybrid attention"

3 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

3 results for "hybrid attention"

Disaggregated Serving for Hybrid SSM Models in vLLM

Hybrid architectures that interleave Mamba-style SSM layers with standard full-attention (FA) layers — such as NVIDIA Nemotron-H — are gaining traction as a way…

Tue, 28 Apr 2026 20:46:24 GMT · 1 view

MACHINE LEARNING

Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D]

Following up on something I posted a few days back about fine-tuning for multi-task reasoning. Read a lot since then, and I've moved past the dense 3B vs 7B question — landing on Nemotron 3 Nano (the …

Sun, 26 Apr 2026 16:10:10 GMT · 6 views

FINAL-Bench/Darwin-36B-Opus · Hugging Face

Darwin-36B-Opus is a 36-billion-parameter mixture-of-experts (MoE) language model produced by the Darwin V7 evolutionary breeding engine from two publicly available parents: Father : Qwen/Qwen3.6-35B-…

Sun, 26 Apr 2026 17:11:53 GMT · 6 views

Or browse by topic

World US Politics Technology AI Markets Business Science Climate Health Culture Media

Results for "hybrid attention".

Disaggregated Serving for Hybrid SSM Models in vLLM

Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D]

FINAL-Bench/Darwin-36B-Opus · Hugging Face

Or browse by topic