WeSearch

I used autoresearch to improve my AGENTS.md, measured against real tasks

Stet· ·9 min read · 0 reactions · 0 comments · 10 views
#technology#artificial intelligence#software development
I used autoresearch to improve my AGENTS.md, measured against real tasks
⚡ TL;DR · AI summary

The author utilized Codex to improve their AGENTS.md through iterative testing against real tasks. Despite initial improvements, the final version showed regression on a clean holdout, indicating that blindly implementing changes could be detrimental. The process highlighted the importance of treating AGENTS.md as a critical component of coding systems rather than just documentation.

Key facts
Original article
Stet · Stet
Read full at Stet →
Opening excerpt (first ~120 words) tap to expand

I had Codex iterate on its own AGENTS.md 8 times and measured each version against real PRs. The best one still regressed on a clean holdout.May 27, 2026resultsmethodologyprocesstracing behaviorwhere I landedtakeaway I have a confession: I vibe-coded my AGENTS.md, and I'm pretty sure it's slop. I needed to make it better. Naturally, I asked Codex to do it. The difference: this time, Codex used a benchmark on my repo to measure each change, and optimized AGENTS.md against the data, instead of on pure vibes. Why We Should Take AGENTS.md Seriously Saying "AGENTS.md is important" is, at this point, a cliche. At risk of beating a dead horse, I'll say it again. Someone adds a rule that sounds smart, senior, and reasonable, commits it, and hopes the agent behaves better.

Excerpt limited to ~120 words for fair-use compliance. The full article is at Stet.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from Stet