Run your own local LLM with rate limits via API-keys
A new Ruby prototype allows users to run a local LLM proxy with rate limits using API keys. The proxy supports a refillable token bucket system and can be set up with minimal dependencies. Users can test the proxy and manage token limits for individual clients.
- ▪The proxy listens on 0.0.0.0:8899 by default and forwards requests to a local LLM setup.
- ▪Each bearer token has its own token bucket, and requests without a bearer token are bucketed by remote IP.
- ▪The system allows for a maximum of 10 tokens per user, with a refill rate of 2 tokens every 5 minutes.
Opening excerpt (first ~120 words) tap to expand
LLM token bucket proxy Small Ruby prototype for an OpenAI-compatible LLM proxy with a refillable token bucket. It uses only Ruby standard libraries: no gems, no Rack, no WEBrick. Run BASE_API_URL=http://192.168.0.124:8888/v1 \ BASE_API_KEY=1mmer \ BASE_MODEL=gemma4 \ ruby llm_proxy.rb The proxy listens on 0.0.0.0:8899 by default. For your local LLM at 192.168.0.124:8888, run the saved local setup: ./run_local_proxy.sh That starts the Ruby proxy at http://127.0.0.1:8899/v1 and forwards to http://192.168.0.124:8888/v1.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.