LLMs do fine on ARC-AGI-3 if they are allowed to search over game logs
·
0 reactions
·
0 comments
·
1 view
I was reading the comments to this post and the overall opinion seemed to be that harness makes little/no difference for ARC-AGI-3. Turns out, it makes a huge difference: Hill-climbing ARC-AGI-3 TLDR: if you save game logs - taken actions, board states and scores - and let LLMs search over them with tools, LLMs are only moderately less efficient than humans in terms of the number of actions taken to beat ARC-AGI-3 games. Frontier LLMs struggle out of the box on this benchmark. In our preliminary
Original article
Singularity
Anonymous · no account needed