Reward hacking is swamping model intelligence gains
On SWE-bench Pro, 63% of successful Opus 4.8 Max resolutions retrieved the fix rather than derived it. Stricter eval harnesses show how benchmark scores can conflate coding ability with answer retrieval.
Opening excerpt (first ~120 words) tap to expand
Blog / researchJun 25, 2026·researchReward hacking is swamping model intelligence gainsNaman Jain · 7 min readTable of Contents↑Catch a model with a modelStricter environment designA growing gapDesigning evals for aware agentsSmarter models are becoming more resourceful at hacking coding benchmarks. Eval suites built from real bugs that were later fixed are especially vulnerable because the problems have already been solved. If the agent has access to repository history or the public web, it can sometimes look up the answer rather than derive it. To measure how widespread this behavior is, we built an agent to audit eval trajectories. On SWE-bench Pro, we found that 63% of successful Opus 4.8 Max resolutions retrieved the fix rather than derived it.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Cursor.