WeSearch

I got 3× faster HFQ4 prefill on Strix Halo in hipfire with an opt-in MMQ path

· 0 reactions · 0 comments · 5 views

I recently contributed an experimental HFQ4-G256 MMQ prefill path to hipfire, an RDNA-focused LLM inference engine. Disclaimer: I authored the PR, so this is partly a contribution note, but I am mainly looking for independent validation from other AMD users. Before this PR, HFQ4 prefill in hipfire was going through a more generic/slower path. On my Strix Halo system, prompt processing was clearly the bottleneck: longer prefills were around ~310–340 tok/s. The new path adds an opt-in MMQ-style pr

Original article
LocalLlama
Read full at LocalLlama →
Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from LocalLlama