We benchmarked gpt-oss-120b across 6 inference providers and found a 10x throughput spread
·
0 reactions
·
0 comments
·
7 views
We ran a benchmark across 10+ LLM routers, providers, and inference backends to answer the questions that come up every time someone picks a provider. Key findings: Do LLM routers add latency? No, OpenRouter was actually 70ms faster than OpenAI direct on time to first token (0.640s vs 0.712s) and Opper matched OpenAI directly within confidence intervals. How much does AWS region selection matter? Geography dominates model choice, Tokyo was 2x slower than Ireland (3.08s vs 1.61s), a bigger impact
Original article
LocalLlama
Anonymous · no account needed