I got 3× faster HFQ4 prefill on Strix Halo in hipfire with an opt-in MMQ path
·
0 reactions
·
0 comments
·
5 views
I recently contributed an experimental HFQ4-G256 MMQ prefill path to hipfire, an RDNA-focused LLM inference engine. Disclaimer: I authored the PR, so this is partly a contribution note, but I am mainly looking for independent validation from other AMD users. Before this PR, HFQ4 prefill in hipfire was going through a more generic/slower path. On my Strix Halo system, prompt processing was clearly the bottleneck: longer prefills were around ~310–340 tok/s. The new path adds an opt-in MMQ-style pr
Original article
LocalLlama
Anonymous · no account needed