WeSearch

Trained a 3B Llama with GRPO on a chemistry RL environment. Drug-like molecules in 6 hours of GPU time. [P]

· 0 reactions · 0 comments · 5 views
Original article
r/MachineLearning
Read full at r/MachineLearning →
Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from r/MachineLearning