PAVO-Bench – 50K voice turns and an 85K-param router for ASR→LLM→TTS
A 50K-turn voice pipeline benchmark and an 85K-param meta-controller that cuts P95 latency 10.3% and energy 71% vs fixed cloud. TMLR 2026. - vnmoorthy/pavo-bench
Opening excerpt (first ~120 words) tap to expand
PAVO: Pipeline-Aware Voice Orchestration Demand-conditioned inference routing for real-time ASR → LLM → TTS voice pipelines. PAVO treats the voice-assistant pipeline as a jointly optimizable inference graph. An 85,041-parameter meta-controller, trained with multi-objective PPO in 106 seconds, decides per turn whether to route each ASR → LLM → TTS call to a cloud or edge configuration. The empirical contribution is a characterization of inter-stage coupling constraints — quality dependencies where upstream ASR choices bound what downstream LLMs can recover from. Authors: NarasingaMoorthy VeiluKanthaPerumal (University of Pennsylvania) and Mohammed Imthathullah (Google).
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.