Show HN: Thaw – Git branch for a running LLM (fork agents, skip prefill)
Thaw is a new tool designed for AI agents that allows them to fork multiple branches for problem exploration while sharing memory. This innovation enables faster processing by skipping the cold prefill stage and running divergent branches concurrently. It is particularly beneficial for reinforcement learning teams and coding-agent teams, significantly reducing the time and resources required for training and exploration.
- ▪Thaw allows AI agents to fork multiple branches from a shared memory, improving efficiency in problem exploration.
- ▪The tool can reduce the time taken for reinforcement learning rollouts from around 340 seconds to just 0.88 seconds.
- ▪Thaw is open source and compatible with existing frameworks like vLLM and SGLang.
Opening excerpt (first ~120 words) tap to expand
thaw The fork primitive for AI agents. When your agent forks N ways to explore a problem, thaw skips the cold prefill and runs them in parallel from one shared memory. Snapshot a running session — weights, KV cache, scheduler state, prefix-hash table — and hydrate N divergent children at the fork point. git branch for live AI agents. pip install thaw-vllm The receipt — ForkPool, 2026-04-20 Pre-warmed subprocess pool holds the engine once; each fork_completions() call snapshots KV only. Llama-3.1-8B on H100 80 GB PCIe, 5 rounds × 4 branches × 64 tokens: Stage Time init_pool (one-time — workers boot with real weights) 22.3s First fork round 1.16s Median fork round 0.88s Per-round cost: ~340s cold-boot → sub-second (≈400× amortized). All rounds 4/4 non-empty and divergent.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.