Trained a 3B Llama with GRPO on a chemistry RL environment. Drug-like molecules in 6 hours of GPU time. [P]
·
0 reactions
·
0 comments
·
5 views
Original article
r/MachineLearning
Anonymous · no account needed