18 stories tagged with #mlops, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Mlops"
Handling Failure: The Most Important Part of AI Systems
Every AI system will fail. The question isn't whether it will happen. The question is: What...…
Virtual keys per tenant: ditching our custom LLM billing layer
TL;DR: We had 11,247 lines of Python middleware handling per-tenant LLM cost attribution, rate...…
AI Observability: Stop Flying Blind in Production
You shipped your AI feature three months ago. Users love it. Usage is growing. But when someone asks...…
LLM-as-judge variance broke our DPO training signal for 3 weeks
TL;DR: Our DPO pipeline used a single LLM as the preference judge. Training reward climbed every run....…
Capping VLM spend per CV researcher: hierarchical budgets in practice
TL;DR: Our 11-person CV team at Prophesee was burning through €3-4k weeks of VLM spend on dataset...…
Token-level eval harness for tool-calling agents: what we wired up
TL;DR: We replaced our "did the agent finish the task" pass/fail eval with a token-level harness that...…
Prefix caching in vLLM under multi-tenant agent traffic
TL;DR: We turned on vLLM's prefix cache for our agent workloads at Nexus Labs and watched TTFT drop...…
How to Detect GPU Waste in a Kubernetes Cluster
GPU waste in Kubernetes does not announce itself. Your cluster shows healthy utilization. Your...…
Why 91% of AI Agents Fail in Production (And What the 9% Do Differently)
Everyone is building AI agents right now. Autonomous systems that reason, plan, and act without...…
llm-nano-vm v0.8.0 — deterministic FSM runtime for LLM pipelines, now with output validation and per-step timeouts
PyPI: pip install llm-nano-vm GitHub: http://github.com/Ale007XD/nano_vm MCP gateway:...…
I Revived a Broken MLOps Platform — Now It's Self-Service, Policy-Guarded, and Operationally Credible
I abandoned this Kubernetes platform on April 4th. 48 days later I rebuilt it: CrashLoopBackOff everywhere → self-service GitOps, policy enforcement, and deterministic recovery. 21…
Inference Routing Is Becoming an Infrastructure Placement Problem
The request arrives. The model answers. For most teams, everything in between is invisible — a...…
Detecting Silent Model Failure: Drift Monitoring That Actually Works
TL;DR: Most drift monitoring setups alert on the wrong thing. Feature distribution drift is cheap to...…
When AI Meets Reality: Why “Hello World” Isn’t Enough for LLM Systems
Most AI tutorials stop at “Hello World.” You wire up a model, send a prompt, get a response, and feel...…
KubeCon Amsterdam 2026: The Industrialization of ML - A Deep Dive into Uber’s AI Platform Architecture.
This article serves as a technical follow-up to our KubeCon 2026 coverage, providing a comprehensive...…
The Agent Is 20% of the Work. The Platform Is the Other 80%.
A payroll agent hit 94% accuracy in testing and dropped to 70% in production. What closed the gap had nothing to do with the model. Here's what that means for every enterprise team…