WeSearch

When Retries Turn Hostile — How Control Logic Kills Production Systems

·5 min read · 0 reactions · 0 comments · 3 views
#reliability#devops#sre#programming#system design
When Retries Turn Hostile — How Control Logic Kills Production Systems
⚡ TL;DR · AI summary

Retries in production systems, intended to handle failures, can exacerbate outages when not carefully designed, as seen in the 2012 Knight Capital incident that resulted in $440 million in losses. Patterns like dogpile effects, cascading failures, and long timeouts can create self-inflicted system damage during recovery. Safe retry strategies such as exponential backoff, jitter, and retry budgets are essential to prevent destructive collective behavior.

Key facts
Original article
DEV.to (Top)
Read full at DEV.to (Top) →
Opening excerpt (first ~120 words) tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3800250) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Ken Imoto Posted on May 1 When Retries Turn Hostile — How Control Logic Kills Production Systems #sre #devops #reliability #programming "Your retries are killing us." A service team received this message from a downstream dependency during an outage. The upstream API was timing out, so naturally, the client retried. 3 times, 5 times, 10 times. The client thought it was doing the right thing.

Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from DEV.to (Top)