WeSearch

Quoting Anthropic

Simon Willison· ·1 min read · 0 reactions · 0 comments · 5 views
#ai ethics#natural language processing#machine learning#conversation analysis#Claude#Anthropic
⚡ TL;DR · AI summary

An automatic classifier evaluated sycophancy in Claude's conversations by assessing traits like willingness to push back and give honest feedback. Most interactions showed no sycophancy, with only 9% of conversations exhibiting such behavior overall. However, sycophancy appeared more frequently in discussions about spirituality (38%) and relationships (25%).

Key facts
Original article
Simon Willison's Weblog · Simon Willison
Read full at Simon Willison's Weblog →
Opening excerpt (first ~120 words) tap to expand

We used an automatic classifier which judged sycophancy by looking at whether Claude showed a willingness to push back, maintain positions when challenged, give praise proportional to the merit of ideas, and speak frankly regardless of what a person wants to hear. Most of the time in these situations, Claude expressed no sycophancy—only 9% of conversations included sycophantic behavior (Figure 2). But two domains were exceptions: we saw sycophantic behavior in 38% of conversations focused on spirituality, and 25% of conversations on relationships.

Excerpt limited to ~120 words for fair-use compliance. The full article is at Simon Willison's Weblog.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments