Prompt Injection as Role Confusion

Simon Willison· Jun 22, 2026 · 11:59 PM UTC ·2 min read · 0 reactions · 0 comments · 9 views

via

Simon Willison's Weblog

Prompt Injection as Role Confusion First, I absolutely love this: This is a blog-style writeup of the paper. I wish every paper would come with one of these. Academic writing is pretty dry - the impact of a paper can be so much higher if you publish a readable version to accompany the formal one. Charles Ye, Jasmine Cui, and Dylan Hadfield-Menell present some fascinating research into the challenge of having models distinguish their own privileged text (here wrapped in role tags like , , and ) f

Original article

Simon Willison's Weblog · Simon Willison

Read full at Simon Willison's Weblog →

Opening excerpt (first ~120 words) tap to expand

Prompt Injection as Role Confusion (via) First, I absolutely love this: This is a blog-style writeup of the paper. I wish every paper would come with one of these. Academic writing is pretty dry - the impact of a paper can be so much higher if you publish a readable version to accompany the formal one. Charles Ye, Jasmine Cui, and Dylan Hadfield-Menell present some fascinating research into the challenge of having models distinguish their own privileged text (here wrapped in role tags like <system>, <think>, and <assistant>) from untrusted user input wrapped in <user>. The bad news: they confirm that not only is this not possible, but it looks like models take the style of the text more seriously than the actual text! This leads to some very concerning jailbreaks.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at Simon Willison's Weblog.

Anonymous · no account needed

Discussion

0 comments

Prompt Injection as Role Confusion

Discussion

More from Simon Willison's Weblog