The Prometheus label that blew our monitoring bill out 6x
A recent incident at Buildkite saw their Prometheus monitoring bill surge from $1,800 to over $11,000 in a month due to a single high-cardinality label. The label, which tracked unique build IDs, resulted in millions of time series being generated, significantly increasing costs. The team implemented new rules to avoid similar issues in the future by managing label cardinality more effectively.
- ▪Buildkite's monitoring bill increased sixfold due to a single Prometheus label.
- ▪The label added unique build IDs, leading to millions of active time series.
- ▪The team adjusted their monitoring practices to prevent high-cardinality labels from inflating costs.
Opening excerpt (first ~120 words) tap to expand
try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3864932) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } claire nguyen Posted on May 29 The Prometheus label that blew our monitoring bill out 6x #devops #infrastructure #sre TL;DR: Our metrics bill went 6x in a single month. Traffic was flat. One Prometheus label carrying per-build IDs spawned millions of time series, and the backend charges by active series. Here's how we caught it and the label rules we run now so it doesn't happen again. The bill, not the traffic I'm on the infra team at Buildkite.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).