WeSearch

That 0.8 second P99 Latency Cliff in Production Wasnt Supposed to Happen

·5 min read · 0 reactions · 0 comments · 7 views
#webdev#programming#devops#kubernetes
That 0.8 second P99 Latency Cliff in Production Wasnt Supposed to Happen
⚡ TL;DR · AI summary

The article discusses the challenges faced by a team while scaling their matchmaking engine due to latency issues caused by their configuration layer, Veltrix. The team experienced significant outages when traffic spikes led to a cache stampede, resulting in excessive gRPC calls to a single Redis instance. Ultimately, they redesigned their configuration management system into ConfigEdge, which improved performance by eliminating the reliance on gRPC and Redis.

Key facts
Original article
DEV.to (Top)
Read full at DEV.to (Top) →
Opening excerpt (first ~120 words) tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3942542) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } mary moloyi Posted on May 27 That 0.8 second P99 Latency Cliff in Production Wasnt Supposed to Happen #webdev #programming #devops #kubernetes The Problem We Were Actually Solving We built the Treasure Hunt Engine to process millions of concurrent matchmaking rounds. Each round required sub-300 ms latency end-to-end: ingest a player request, resolve their region, queue them, and return an assignment.

Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from DEV.to (Top)