Skip to content

P09: Model Routing Benchmark

What You Do

Run the same 10 coding tasks three ways: frontier-only, static rules routing, and cascading escalation. Compare pass rate, tokens, cost, and cost per solved task.

Harness Mechanism

RouterLLM, LLMProfileStore, SwitchLLMTool, SDK metrics, Agent Canvas model-switch events, and Laminar traces.

Open First

Keep

A routing benchmark table and escalation policy you can defend with traces and metrics.

The main lesson: use the cheapest model you trust, protect high-risk work with a risk floor, and escalate only when evidence says the current model is stuck.

Built as a friendly front door for the runnable OpenHands harness lab.