WeSearch

Micro-Expert-Router: Running Mixtral-Class Moe Models on NVMe SSDs Without a GPU

·49 min read · 0 reactions · 0 comments · 19 views
#technology#artificial intelligence#data storage#machine learning
Micro-Expert-Router: Running Mixtral-Class Moe Models on NVMe SSDs Without a GPU
⚡ TL;DR · AI summary

The Micro-Expert-Router is a Rust execution engine designed for Mixture-of-Experts models that utilizes NVMe SSDs for efficient data handling. By keeping the router in RAM and hot-swapping experts from the SSD, it allows for the execution of large models on less powerful hardware. This innovative approach leverages quantization to optimize performance and reduce I/O costs significantly.

Key facts
Original article
GitHub
Read full at GitHub →
Opening excerpt (first ~120 words) tap to expand

Micro-Expert-Router, SSD-Streamed MoE Execution Engine A Rust execution engine for Mixture-of-Experts models that keeps the router resident in RAM and hot-swaps individual experts on demand from a PCIe-attached NVMe drive into a pool of pre-allocated, page-aligned RAM buffers using O_DIRECT positional reads (pread(2) via tokio::task::block_in_place, kernel-page-cache bypass). Each routed expert then executes a real Mixtral / Llama-style SwiGLU FFN forward pass directly over the bytes that just arrived from the drive.

Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from GitHub