Hybrid local/cloud model routing.
Instead of sending every request to a paid API, OCP now scores each request for complexity and dispatches it to the right model tier automatically. Simple tasks stay local. Complex reasoning escalates to your paid provider.
New package: ocp-router
Install the core package plus whichever paid backend you use. ocp-server and ocp-client are unchanged at v0.1.1 — no breaking changes to existing deployments.
pip install ocp-router
Core — Ollama local backend included
pip install ocp-router[anthropic]
+ Anthropic Claude paid backend
pip install ocp-router[openai]
+ OpenAI paid backend
Classify → dispatch → trace.
Three components work together: TaskClassifier scores complexity, OCPRouter dispatches to the right backend, and every call returns a full RouteResult trace.
1. TaskClassifier — instant complexity scoring
Every request is scored 0.0 → 1.0 using deterministic heuristics — no model required, runs in microseconds.
Escalates to paid ↑
security+0.55architect+0.55migrate+0.55deadlock+0.40multi-file+0.35refactor+0.25Keeps local ↓
explain−0.10summarise−0.10search−0.102. OCPRouter — one call, full trace
make_router() reads all config from env vars. Every call returns the answer and a full trace of the routing decision.
3. OllamaBackend — verified locally
Integration-tested against llama3.2 running via ollama serve. Ollama uses Metal GPU on Apple Silicon and CUDA on NVIDIA — a dedicated GPU host is recommended for production.
4. Vendor-neutral backend protocol
Both local and paid slots accept any object implementing the ModelBackend protocol — three methods, no base class required. AnthropicBackend and OpenAIBackend are convenience adapters.
Three env vars. Nothing else changes.
Add three lines to your .mcp.json env block. Works with Claude Code, Cursor, Windsurf, and any MCP-compatible IDE.
All config via env vars.
Every option has a sensible default. Override only what you need.
| Variable | Default | Description |
|---|---|---|
OCP_LOCAL_MODEL | llama3.2 | Ollama model to use for local inference |
OCP_OLLAMA_URL | http://localhost:11434 | Ollama base URL |
OCP_PAID_BACKEND | anthropic | anthropic or openai |
OCP_PAID_MODEL | claude-sonnet-4-6 | Paid model identifier |
OCP_ROUTE_THRESHOLD | 0.5 | Complexity score at which requests escalate to paid. Set 0.0 for always local, 1.0 for always paid. |
Where routing fits in OCP.
Routing builds on OCP's existing context primitives. Before a complex request reaches your paid provider, OCP assembles an enriched prompt from three sources — so the model gets targeted, context-rich input, not just the raw request.
Simple tasks routed to the local model skip this assembly — they receive only the raw request, which is why they return in under 100ms.