WeSearch

Show HN: Thaw – Git branch for a running LLM (fork agents, skip prefill)

·16 min read · 0 reactions · 0 comments · 18 views
#ai#machine learning#open source#HuggingFace#Llama-3.1-8B#ForkPool
Show HN: Thaw – Git branch for a running LLM (fork agents, skip prefill)
⚡ TL;DR · AI summary

Thaw is a new tool designed for AI agents that allows them to fork multiple branches for problem exploration while sharing memory. This innovation enables faster processing by skipping the cold prefill stage and running divergent branches concurrently. It is particularly beneficial for reinforcement learning teams and coding-agent teams, significantly reducing the time and resources required for training and exploration.

Key facts
Original article
GitHub
Read full at GitHub →
Opening excerpt (first ~120 words) tap to expand

thaw The fork primitive for AI agents. When your agent forks N ways to explore a problem, thaw skips the cold prefill and runs them in parallel from one shared memory. Snapshot a running session — weights, KV cache, scheduler state, prefix-hash table — and hydrate N divergent children at the fork point. git branch for live AI agents. pip install thaw-vllm The receipt — ForkPool, 2026-04-20 Pre-warmed subprocess pool holds the engine once; each fork_completions() call snapshots KV only. Llama-3.1-8B on H100 80 GB PCIe, 5 rounds × 4 branches × 64 tokens: Stage Time init_pool (one-time — workers boot with real weights) 22.3s First fork round 1.16s Median fork round 0.88s Per-round cost: ~340s cold-boot → sub-second (≈400× amortized). All rounds 4/4 non-empty and divergent.

Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from GitHub