2 results for "low latency inference"
LOCALLLAMA
We benchmarked gpt-oss-120b across 6 inference providers and found a 10x throughput spread
We ran a benchmark across 10+ LLM routers, providers, and inference backends to answer the questions that come up every time someone picks a provider. Key findings: Do LLM routers add latency? No, Ope…
ARXIV CS.AI
RedParrot: Accelerating NL-to-DSL for Business Analytics via Query Semantic Caching
Recently, at Xiaohongshu, the rapid expansion of e-commerce and advertising demands real-time business analytics with high accuracy and low latency. To meet this demand, systems typically rely on conv…