We benchmarked gpt-oss-120b across 6 inference providers and found a 10x throughput spread

Apr 27, 2026 · 4:13 PM UTC · 0 reactions · 0 comments · 7 views

We ran a benchmark across 10+ LLM routers, providers, and inference backends to answer the questions that come up every time someone picks a provider. Key findings: Do LLM routers add latency? No, OpenRouter was actually 70ms faster than OpenAI direct on time to first token (0.640s vs 0.712s) and Opper matched OpenAI directly within confidence intervals. How much does AWS region selection matter? Geography dominates model choice, Tokyo was 2x slower than Ireland (3.08s vs 1.61s), a bigger impact

Original article

LocalLlama

Read full at LocalLlama →

Anonymous · no account needed

Discussion

0 comments

We benchmarked gpt-oss-120b across 6 inference providers and found a 10x throughput spread

Discussion

More from LocalLlama