60 stories tagged with #observability, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Observability"
Letter: Starboard has taken a significant stake in observability company Dynatrace and is pushing for changes to boost the stock; DT jumps 7%+ after hours (Lauren Thomas/Wall Street Journal)
AgentSight: System-wide AI agent tracing and monitoring with eBPF
Zero instrucment system-level AI agent tracing in eBPF - eunomia-bpf/agentsight…
Rebuilding Postgres Metrics on Self-Hosted VictoriaMetrics with Zero Downtime
How we rebuilt Xata's PostgreSQL branch metrics on a self-hosted VictoriaMetrics stack in six weeks, with zero user-visible downtime.…
Building a High-Performance Real-Time Data Pipeline with Edge Inference and Observability
Building a High-Performance Real-Time Data Pipeline with Edge Inference and...…
Observability in AI: Why Monitoring Systems Is No Longer Enough
Observability has always been one of the most important parts of building reliable software. In...…
The Four Signals of AI Observability
Treat your AI feature like a software you can watch, not a model you hope works.…
Show HN: 500 years of Joseon court omens as an observability dashboard
Observability Telemetry and Predictive AIOps
The Non-Negotiable Imperative: Architecting Predictive AIOps for IBM ACE/MQ The era of...…
The missing layer between W&B and Datadog: observability for AI robots
A backend service falls over at 2am and you know the drill: open the dashboard, follow the trace,...…
From Kernel Scheduler to Python Source Line: Tracing a GPU Stall End to End
TL;DR A GPU that reports 97% utilization can still be the slowest part of a training step,...…
Sidemark: Active Telemetry Comments for C#
OpenTelemetry has quietly become table stakes. That's a good thing, but if you've instrumented a real...…
Decoding the Observability Pipeline: A Java Architect's Guide to Metrics, Logs, and Traces
A comprehensive guide to demystifying observability in the Java ecosystem. Learn the 4-phase pipeline and architectural patterns.…
Open House o11y announcements: MCP server, AI Notebooks, and ClickStack Cloud
At Open House 2026, we announced ClickStack Cloud in private preview, AI Notebooks in beta, and the ClickStack MCP server: three updates that make observability on ClickHouse faste…
I Turned on Agent Tracing for 30 Days. 4 Hidden Bottlenecks Were Eating 47% of My Tokens.
A production Claude agent had been quietly burning 5.2M tokens a month. I turned on per-call tracing for 30 days, found four bottlenecks no dashboard surfaced, and cut the bill in …
ClickStack Cloud: Serverless observability powered by ClickHouse
Introducing ClickStack Cloud: a fully managed, serverless observability platform built on ClickHouse where you send OpenTelemetry data to a managed endpoint and immediately explore…
Remetric: find waste in self-hosted Prometheus, Grafana, and Loki
Self-hosted Prometheus stacks degrade in predictable ways: a label explosion that quietly doubles...…
AI Observability: Stop Flying Blind in Production
You shipped your AI feature three months ago. Users love it. Usage is growing. But when someone asks...…
I built a simple pytest plugin for test observability (need your help 😅)
Guys, I need your help 😅 As I’ve noticed, many QA Engineers (also devs) do not measure how stable or...…
Koog 1.0 Is Out: Stable Core, Better Interop, and Multiplatform Observability
Koog 1.0 is out! JetBrains’ AI agent framework for Kotlin and Java now features a stable core with a 1-year API stability guarantee for production backends.…
Chronos vs Toto: Zero-Shot Forecasting Benchmark Results
Introduction Good forecasts help with capacity planning and quieter alerts. But one...…
AI SRE and AI DevOps: different problems, one reliability stack
Vendors and headlines often blur "AI for operations" into one bucket. In practice, two distinct...…
When Does Adaptive Guidance Help? Belief-Aware Privileged Distillation for Autonomous Driving Under Partial Observability
Guided Soft Actor-Critic (GSAC) distills knowledge from a privileged full-state teacher to a partial-observation student for autonomous driving, but uses a fixed distillation coeff…
I Built a Profiler to Audit My Own AI Tool Calls. Here's What I Learned About Observability
I built a profiler to audit my own tool calls. After loading 157 skills in 12 days, I realized I had...…
Topology-aware network observability and correlation
Taming the agentic influx: a blueprint for AI business observability
API analyst Kin Lane tells The New Stack why business observability, FinOps, and MCP governance are essential to managing runaway AI spend in the enterprise.…
Observability was built for humans. AI agents need something different
AI agents reshape observability, demanding data-first infrastructure…
Why is triaging such a hard problem for observability AI vendors?
what’s your stack for web analytics + observability?
Distributed Tracing for LLM Agents: When MCP Makes Tool Calls Observable
How application observability extends to stochastic agent loops — and why the tool boundary...…
Observability in 2026: Distributed Tracing Replaced Logs, and OpenTelemetry Won
Observability in 2026: Distributed Tracing Replaced Logs, and OpenTelemetry Won The...…
Code review is not about catching bugs
My former Parse colleague Charity Majors – now CTO of Honeycomb and one of the strongest voices in the observability space – recently posted something that caught my attention. She…
OpenTelemetry Is Now a CNCF Graduate — and It's Coming for Your AI Stack
OpenTelemetry graduated as a CNCF project on May 21, 2026. That's not just a badge — it's the formal...…
I Built chanprobe Because My Go Queues Were Invisible
A small story about Go channels, hidden backpressure, and why I built an observable bounded queue for production services.…
Agent Observability and what I think
The 114KB Span Attribute That Hid Our LCP Data
A React Native WebView debugging story about LCP, data URLs, and trace attributes We...…
Per-user cost attribution for your AI APP
You ship your AI feature. It works. A week later your OpenAI bill is $400 and you have no idea which...…
End-to-End Observability for vLLM and TGI: from DCGM to Tokens
Running large language model inference servers in production exposes gaps that neither stock...…
l9gpu - open-source GPU observability with workload-level attribution [P]
A 3-step agent cost me $4.20. agenttrace showed me the O(n ) tool call hiding in plain sight.
I expected a 12 cent run. I paid $4.20. The trace showed one tool result getting re-issued on every subsequent step, with the full prior history attached each time. Here is the Rus…
Log Parsing with AI at Bronto
How Bronto combines curated Java parsers, Dissect/Grok matching, and AI-powered pattern generation to automatically parse any log format — structured or otherwise.…
Istio 1.30 Deep Dive — Agentgateway, Ambient Multicluster, TrafficExtension API, and 4 CVE Patches (JWKS RSA Leak, XDS Debug Auth)
On May 18, 2026, the Istio community shipped **Istio 1.30.0** alongside backports 1.29.3 and 1.28.7. On the surface it's a regular quarterly release,…
Show HN: Sentience – governance that carries across AI agent sessions
Execution-boundary governance for AI agents — observe execution, surface divergence from intent, and understand where compute went.…
VictoriaTraces: Tracing, Observability, and OpenTelemetry0
OpenTelemetry and Observability in theory, deploying VictoriaTraces on Kubernetes, integration with Grafana and VictoriaLogs, and VMAlert for trace metrics…
Membangun Observability GBIM: Metrics Bisnis, Correlation ID, dan k6 Smoke Test
Judul Membangun Observability GBIM: Metrics Bisnis, Correlation ID, dan k6 Smoke Test ...…
Put a Microscope on Hermes: Full Visibility into Agent Execution
Alibaba Cloud's OpenTelemetry-based observability plugin brings full visibility to Hermes AI agent...…
"How I discovered a hidden 146W power draw on NVIDIA A100 GPUs (and built an open‑source fix)”
How I discovered a hidden 146W power draw on NVIDIA A100 GPUs (and built an open‑source fix) TL;DR:...…
Mythos and observability: what happens after AI finds the vulnerability?
Mythos and observability: what happens after AI finds the vulnerability?
Agent Traces Need to Cross the MCP Boundary | Focused Labs
Agent observability breaks when traces stop at the MCP tool boundary. Pass W3C trace context through MCP to connect planner, tool, and service spans.…
Open Source OTEL Observability Platform
OpenTelemetry observability platform. Contribute to Makisuo/maple development by creating an account on GitHub.…
Launch HN: Superlog (YC P26) – Observability that installs itself and fixes bugs
Anyone else struggling with production error detection despite having tons of observability data?
Energy Grid Observability: What the Power Sector Can Learn from Google SRE
On August 14, 2003, a software bug silenced an alarm. The alarm was part of the state estimation...…
One Kernel, Zero Sidecars: Tracing AI Workloads Without an Agent on Every Host
Per-host overhead multiplied across N hosts, vs. one kernel-level instrumentation per host. The...…
LLM Tracing with MLflow AI Gateway
Use MLflow AI Gateway to automatically trace LLM calls from coding agents like Copilot CLI without changing your code.…
AIOps That Actually Helps: Start with Telemetry, Correlation, and Safe Automation
A practical guide to AIOps built on telemetry, signal correlation, and safe automation instead of hype.…
You're probably underusing middleware for HTTP response handling
Deciding what counts as a successful response in a `handle_response/1` after `Tesla.get`, instead of inside the Tesla middleware chain, makes 5xx and malformed payloads silently lo…
Towards approachable observability with wide events
A tragedy in four acts…
Distributed Tracing in NestJS: End-to-End Request Visibility with OpenTelemetry
In a monolithic application, debugging a slow or failing request is straightforward, you have one...…
After 5 years of Go services, here's the boilerplate I wish existed
An opinionated production-grade Go microservice template — MongoDB v2 driver, OpenTelemetry full-stack, Kafka + SQS with retry/DLQ, compile-time DI, distroless Docker, and CLAUDE.m…