I ran GLM-5.1 on a 16GB RAM machine
The MoE-on-a-Potato project successfully ran the 754-billion parameter GLM-5.1 model on a consumer-grade PC with only 16GB of RAM. This achievement demonstrates the feasibility of running large language models on budget hardware by utilizing disk-streaming inference techniques. The project highlights the importance of SSD read bandwidth as the primary bottleneck in local Mixture-of-Experts execution.
- ▪The project ran GLM-5.1 on a Ryzen 5 5600G CPU with 16GB of RAM without crashing.
- ▪The model achieved a maximum system RAM footprint of only 8.34 GB during operation.
- ▪The SSD read bandwidth was identified as the main limiting factor for performance, rather than physical memory capacity.
Opening excerpt (first ~120 words) tap to expand
🧠 MoE-on-a-Potato Running a 754-Billion Parameter LLM on a 16GB RAM Consumer PC "Saying it's impossible is not engineering. Saying we don't know how yet is science." MoE-on-a-Potato is an experimental project dedicated to testing the extreme limits of running massive Mixture-of-Experts (MoE) Large Language Models on consumer-grade, budget hardware. We successfully ran GLM-5.1 (a 754B parameter model, 176GB GGUF size) on a Ryzen 5 5600G (6 Cores / 12 Threads) CPU, Vega 7 iGPU, and 16GB DDR4 RAM without crashing, establishing a scientific proof of concept for low-memory MoE disk-streaming inference.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.