2 stories tagged with #reward-hacking, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Reward Hacking"
ARXIV CS.AI
Hack-Verifiable Environments: Towards Evaluating Reward Hacking at Scale
Aligning autonomous agents with human intent remains a central challenge in modern AI. A key manifestation of this challenge is reward hacking, whereby agents appear successful und…
PRIMEINTELLECT
Systematic Reward Hacking and Prime Sprints
We release tunable RL templates that demonstrate reward hacking at 1B scale and introduce Prime Sprints, an open-access program with sponsored runs for community research.…