5 results for "quantization"
AMG GPUs are faster at pre filling
I did give same prompt same document to 1660ti running Gemma 4 e2b q4 coz of the small vram and another to and igpu running Gemma 4 e4b q8 prefill rate before token generation was like 4-5 times faste…
AMD Hipfire - a new inference engine optimized for AMD GPU's
Came across hipfire the other day. It's a brand new inference engine focused on all AMD GPU's (not just the latest). Github. It uses a special mq4 quantization method. The hipfire creator is pumping o…
Are Unsloth models as good as I read?
Has anybody done some comparing between the models that Unsloth offers and their counter part? For example: I've been using qwen3.6:35b-a3b Q4_K_M , and on my MBP 64GB I get around 39 t/s Using Unslot…
MagicQuant (v2.0) - Hybrid Mixed GGUF Models + New Unsloth Dynamic Learned Configs
MagicQuant v2.0 is here. Introducing hybrid GGUF mixed models, utilization of learned Unsloth Dynamic tensors, a new benchmark philosophy that skips the nonsense! Smaller files. Better KLD trades. Mag…
Higher precision or higher parameter count
I’m wondering if we take models of the same family (e.g qwen3.5 moes). And we compared ggufs that are of different core counts different quantizations but similar sizes. Which model would be better fo…