Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

Apr 27, 2026 · 4:31 PM UTC · 0 reactions · 0 comments · 44 views

Hey fellow Llamas, your time is precious, so I'll keep it short. We built a GGUF port of DFlash speculative decoding. Standalone C++/CUDA stack on top of ggml, runs on a single 24 GB RTX 3090, hosts the new Qwen3.6-27B. We call it Luce DFlash ( ; MIT) ~1.98x mean over autoregressive on Qwen3.6 across HumanEval / GSM8K / Math500, with zero retraining (z-lab published a matched Qwen3.6-DFlash draft on 2026-04-26, still under training, so AL should keep climbing). If you have CUDA 12+ and an NVIDIA

Original article

Read full at Reddit →

Anonymous · no account needed

Discussion

0 comments

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

Discussion

More from Reddit