Show HN: When your agent LLM judge become your enemy

Dmitrii Buchilin· May 27, 2026 · 4:47 PM UTC ·8 min read · 0 reactions · 0 comments · 13 views

⚡ TL;DR · AI summary

A recent study revealed vulnerabilities in LLM agents, particularly through a method called cross-channel authority convergence. The research demonstrated that structured metadata could inadvertently increase the perceived legitimacy of documents, making them more exploitable. This finding has significant implications for the security of retrieval-augmented generation systems.

Key facts

▪The attack exploited the LLM agent's design by sending a seemingly legitimate email that instructed the ingestion agent to archive a compliance document.
▪Adding structured provenance metadata to documents increased their perceived authority, leading to higher success rates for malicious actions.
▪The study found that attackers could achieve high retrieval success rates with minimal knowledge of the system's inner workings.

Original article

Hacker News (AI / LLM) · Dmitrii Buchilin

Read full at Hacker News (AI / LLM) →

Opening excerpt (first ~120 words) tap to expand

We hardened an LLM agent. Each defense we added made it more exploitable.One email. No database access. No intercepted tool calls. Every component operated exactly as designed. The email still went to the attacker.Dmitrii BuchilinMay 25, 2026ShareThe failure mode wasn’t a prompt injection in the traditional sense — no “ignore previous instructions,” no jailbreak. The attack worked by constructing an environment in which the malicious action appeared institutionally legitimate across multiple independent channels simultaneously.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at Hacker News (AI / LLM).

Anonymous · no account needed

Discussion

0 comments

Show HN: When your agent LLM judge become your enemy

Discussion

More from Hacker News (AI / LLM)