How the Community Trained Gemma to "Think" with Tunix and TPUs
The Google Tunix Hackathon challenged developers to enhance reasoning capabilities in language models using limited computational resources. Over 11,000 participants submitted innovative solutions, showcasing effective training techniques for reasoning tasks. The winning models demonstrated advanced methods combining supervised learning and reinforcement learning to improve logical reasoning in AI systems.
- ▪The hackathon focused on transforming non-reasoning models into general reasoning models using Kaggle TPUs.
- ▪G-RaR, the first-place model, improved reasoning by training models to show their work using a rubric-based reward system.
- ▪Second and third place models utilized structured reasoning techniques and innovative training pipelines to enhance logical deduction.
Opening excerpt (first ~120 words) tap to expand
Large Language Models (LLMs) often benefit from "thinking" before they speak for complex tasks. Frontier LLMs like Gemini 3 and leading open weight models like Gemma 4 can produce explicit reasoning traces, commonly called Chain-of-Thought, before answering user questions. But how this reasoning capability is trained is often not disclosed. While there are many reasoning tutorials available on the Internet to train for simple verifiable tasks such as math or coding, accessible and easy-to-reproduce training recipes (including data, training strategy, runnable code and evaluations) for general reasoning remain scarce.This motivated us to hold the Google Tunix Hack: Train a model to show its work hackathon on Kaggle: we challenged developers to transform non-reasoning base models…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Googleblog.