nomyo-router is a transparent proxy for Ollama with model deployment-aware routing. It runs between your frontend application and Ollama backend and is transparent for both.
How It Works
nomyo-router accepts any Ollama or OpenAI API request on the configured port. It then checks the available backends for the specific request. When the request is an embed/ing, chat, or generate call, it is forwarded to the appropriate Ollama server, answered, and sent back through the router.
If another request for the same model config arrives, nomyo-router knows which model runs on which server and routes to a server where that model is already deployed. If the maximum concurrent connections per model are reached, it routes to another serving server with the least connections for fastest completion.
This makes backend utilization significantly more efficient than simple weighted round-robin or least-connection approaches.
Key Features
Model Deployment-Aware Routing
nomyo-router tracks which models are loaded on which endpoints and routes requests to servers that already have the model deployed, avoiding unnecessary reloads.
Semantic LLM Cache
Repeated or semantically similar LLM requests are served from cache with no endpoint round-trip, no token cost, and <10ms response times.
- Exact match mode – identical requests served instantly
- Semantic matching – “What is Python?” matches “What’s Python?”
- Cache keys are scoped to
model + system_promptto prevent cross-tenant leakage - MOE requests (
moe-*) always bypass the cache
# config.yaml -- semantic cache
cache_enabled: true
cache_backend: sqlite
cache_similarity: 0.90
cache_ttl: 3600
cache_history_weight: 0.3
Multi-Endpoint Support
Configure any combination of Ollama and OpenAI-compatible endpoints:
endpoints:
- http://ollama0:11434
- http://ollama1:11434
- https://api.openai.com/v1
llama_server_endpoints:
- http://192.168.0.33:8889/v1
API Format Translation
Seamlessly translates between Ollama API and OpenAI API v1 formats. Connect existing applications to any backend without code changes.
Concurrency Control
Set max_concurrent_connections per endpoint-model pair to match your OLLAMA_NUM_PARALLEL settings and prevent overload.
Optional Authentication
Lock down the router with an API key. Clients authenticate via Authorization: Bearer header or query parameter.
Configuration
# config.yaml
endpoints:
- http://ollama0:11434
- http://ollama1:11434
- http://ollama2:11434
- https://api.openai.com/v1
llama_server_endpoints:
- http://192.168.0.33:8889/v1
max_concurrent_connections: 2
nomyo-router-api-key: ""
api_keys:
"http://192.168.0.50:11434": "ollama"
"http://192.168.0.51:11434": "ollama"
"http://192.168.0.52:11434": "ollama"
"https://api.openai.com/v1": "${OPENAI_KEY}"
"http://192.168.0.33:8889/v1": "llama"
Installation
Docker (Recommended)
# Lean image (~300 MB)
docker pull bitfreedom.net/nomyo-ai/nomyo-router:latest
# Semantic cache image (~800 MB)
docker pull bitfreedom.net/nomyo-ai/nomyo-router:latest-semantic
Run with mounted config:
docker run -d \
--name nomyo-router \
-p 12434:12434 \
-v /absolute/path/to/config:/app/config/ \
-e NOMYO_ROUTER_CONFIG_PATH=/app/config/config.yaml \
bitfreedom.net/nomyo-ai/nomyo-router:latest
From Source
python3 -m venv .venv/router
source .venv/router/bin/activate
pip3 install -r requirements.txt
# Optional: set environment variables
export OPENAI_KEY=YOUR_SECRET_API_KEY
export NOMYO_ROUTER_API_KEY=YOUR_ROUTER_KEY
uvicorn router:app --host 127.0.0.1 --port 12434
For high-concurrency scenarios (>500 simultaneous requests):
uvicorn router:app --host 127.0.0.1 --port 12434 --loop uvloop
Cache Management
# View cache statistics (hit rate, counters, config)
curl http://localhost:12434/api/cache/stats
# Invalidate all cache entries
curl -X POST http://localhost:12434/api/cache/invalidate
Cached Routes
/api/chat/api/generate/v1/chat/completions/v1/completions
Authenticate Requests
If nomyo-router-api-key is configured, include the key with every request:
# Header (recommended)
curl -H "Authorization: Bearer $NOMYO_ROUTER_API_KEY" http://localhost:12434/api/tags
# Query param (fallback)
curl "http://localhost:12434/api/tags?api_key=$NOMYO_ROUTER_API_KEY"
Source Code
Available at bitfreedom.net/nomyo-ai/nomyo-router
Contact Us
Email: ichi@nomyo.ai
Phone: +1 (415) 289-9022
Address: 2810 N Church St, PMB 947236, Wilmington, DE 19802-4447, USA