WeSearch

Introducing AutoMuon, a one line drop in for AdamW [P]

· 0 reactions · 0 comments · 8 views

Hey everyone, I've been working on a small Python package called AutoMuon that makes the Muon optimizer usable as a drop-in replacement for AdamW in arbitrary PyTorch training pipelines. The core idea is relatively simple: Muon works primarily on 2D weight matrices (linear projections, conv layers) on hidden states, but you still need AdamW for embeddings, norms, and biases, etc. AutoMuon scans your model at init, figures out the right optimizer for each parameter automatically. I am open to PRs

Original article
Reddit
Read full at Reddit →
Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from Reddit