3 results for "hybrid attention"
VERCEL
Disaggregated Serving for Hybrid SSM Models in vLLM
Hybrid architectures that interleave Mamba-style SSM layers with standard full-attention (FA) layers — such as NVIDIA Nemotron-H — are gaining traction as a way…
MACHINE LEARNING
Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D]
Following up on something I posted a few days back about fine-tuning for multi-task reasoning. Read a lot since then, and I've moved past the dense 3B vs 7B question — landing on Nemotron 3 Nano (the …
REDDIT
FINAL-Bench/Darwin-36B-Opus · Hugging Face
Darwin-36B-Opus is a 36-billion-parameter mixture-of-experts (MoE) language model produced by the Darwin V7 evolutionary breeding engine from two publicly available parents: Father : Qwen/Qwen3.6-35B-…