I got 3× faster HFQ4 prefill on Strix Halo in hipfire with an opt-in MMQ path

Apr 28, 2026 · 5:57 AM UTC · 0 reactions · 0 comments · 5 views

via

LocalLlama

I recently contributed an experimental HFQ4-G256 MMQ prefill path to hipfire, an RDNA-focused LLM inference engine. Disclaimer: I authored the PR, so this is partly a contribution note, but I am mainly looking for independent validation from other AMD users. Before this PR, HFQ4 prefill in hipfire was going through a more generic/slower path. On my Strix Halo system, prompt processing was clearly the bottleneck: longer prefills were around ~310–340 tok/s. The new path adds an opt-in MMQ-style pr

Original article

LocalLlama

Read full at LocalLlama →

Anonymous · no account needed

Discussion

0 comments

I got 3× faster HFQ4 prefill on Strix Halo in hipfire with an opt-in MMQ path

Discussion

More from LocalLlama