v0.2.0 · ocp-router · New package

Hybrid local/cloud model routing.

Instead of sending every request to a paid API, OCP now scores each request for complexity and dispatches it to the right model tier automatically. Simple tasks stay local. Complex reasoning escalates to your paid provider.

Zero IDE changes Deterministic scoring Vendor-neutral backends 54 tests passing
Install

New package: ocp-router

Install the core package plus whichever paid backend you use. ocp-server and ocp-client are unchanged at v0.1.1 — no breaking changes to existing deployments.

pip install ocp-router Core — Ollama local backend included
pip install ocp-router[anthropic] + Anthropic Claude paid backend
pip install ocp-router[openai] + OpenAI paid backend
# Upgrade from earlier version
pip install --upgrade ocp-router

# ocp-server and ocp-client are unchanged
# No breaking changes to existing deployments
How it works

Classify → dispatch → trace.

Three components work together: TaskClassifier scores complexity, OCPRouter dispatches to the right backend, and every call returns a full RouteResult trace.

1. TaskClassifier — instant complexity scoring

Every request is scored 0.0 → 1.0 using deterministic heuristics — no model required, runs in microseconds.

RequestScoreDestination
"explain this function"0.00Local
"find all usages of db.connect"0.00Local
"summarise the last session"0.00Local
"refactor the login function"0.25Local
── threshold 0.5 (configurable) ──────────────────
"review security vulnerabilities"0.55
"design the payment architecture"0.55
"debug this production deadlock"0.60
"refactor auth across all files"0.80

Escalates to paid ↑

security+0.55
architect+0.55
migrate+0.55
deadlock+0.40
multi-file+0.35
refactor+0.25
Token length, code block size, file path countvaries

Keeps local ↓

explain−0.10
summarise−0.10
search−0.10
Short token count−varies
No code blocks−varies
No file path references−varies

2. OCPRouter — one call, full trace

make_router() reads all config from env vars. Every call returns the answer and a full trace of the routing decision.

from ocp_router import make_router

router = await make_router() # reads env vars

result = await router.route("explain the auth middleware")
result.route_to"local"which tier
result.classify.complexity_score0.000.0 → 1.0
result.classify.signals[]heuristics fired
result.model"llama3.2"model used
result.prompt_tokens34tokens in
result.completion_tokens41tokens out
result.duration_ms82.0measured
result.text"The auth middleware..."the answer

3. OllamaBackend — verified locally

Integration-tested against llama3.2 running via ollama serve. Ollama uses Metal GPU on Apple Silicon and CUDA on NVIDIA — a dedicated GPU host is recommended for production.

# Integration test result
test_ollama_live_generate PASSED (10.59s including model cold start)

4. Vendor-neutral backend protocol

Both local and paid slots accept any object implementing the ModelBackend protocol — three methods, no base class required. AnthropicBackend and OpenAIBackend are convenience adapters.

class MyBackend: @property def model(self) -> str: ... async def is_available(self) -> bool: ... async def generate(self, request: GenerateRequest) -> GenerateResponse: ... # Wire it in: router = OCPRouter( local=MyBackend(), paid=MyBackend(), classifier=TaskClassifier() )
IDE integration

Three env vars. Nothing else changes.

Add three lines to your .mcp.json env block. Works with Claude Code, Cursor, Windsurf, and any MCP-compatible IDE.

{ "mcpServers": { "ocp": { "command": "uvx", "args": ["ocp-server"], "env": { "OCP_DB_PATH": "${workspaceFolder}/.ocp.db", "OCP_LOCAL_MODEL": "llama3.2", ← new "OCP_OLLAMA_URL": "http://localhost:11434", ← new "OCP_ROUTE_THRESHOLD": "0.5" ← new } } } }
Configuration

All config via env vars.

Every option has a sensible default. Override only what you need.

VariableDefaultDescription
OCP_LOCAL_MODELllama3.2Ollama model to use for local inference
OCP_OLLAMA_URLhttp://localhost:11434Ollama base URL
OCP_PAID_BACKENDanthropicanthropic or openai
OCP_PAID_MODELclaude-sonnet-4-6Paid model identifier
OCP_ROUTE_THRESHOLD0.5Complexity score at which requests escalate to paid. Set 0.0 for always local, 1.0 for always paid.
Full pipeline

Where routing fits in OCP.

Routing builds on OCP's existing context primitives. Before a complex request reaches your paid provider, OCP assembles an enriched prompt from three sources — so the model gets targeted, context-rich input, not just the raw request.

ocp.context.searchRetrieve the relevant chunks from the workspace index for this specific request
ocp.session.restoreInject prior decisions, summaries, and state from earlier sessions
ocp.model.routeScore complexity → assemble enriched prompt → dispatch to local or paid — new in v0.2.0
↳ local modelRaw request → Ollama · fast, private, free
↳ paid providerEnriched prompt → Claude / GPT-4 · context-rich input, better output
ocp.result.savePersist the full RouteResult trace for the next session
What the paid model actually receives
Context Retrieved workspace chunks relevant to this request — not the whole codebase, only what matters
Session Summary of prior decisions, agreed conventions, and state from earlier agent sessions
Request The original developer request, unchanged
Enriched prompt — targeted, context-rich, session-aware. The paid model reasons over the right information rather than starting from scratch every call.

Simple tasks routed to the local model skip this assembly — they receive only the raw request, which is why they return in under 100ms.

What's next

ocp.prompt.prepare

Before a complex request reaches your paid provider, a local step compresses and optimises the prompt. Fewer tokens in, better answer out. Coming in a future release.