The router is part of the organism-like system
A population of specialists is only useful if the router can select the right capability at the right cost. Routing experiments should evaluate the router and specialists together.
Baselines
Compare at least four conditions:
| Condition | Purpose |
|---|---|
| Single champion | simplest production baseline |
| Static rules | interpretable routing without learned policy |
| Learned router | adaptive selection based on task features |
| Cascade or coalition | multiple stages or multiple specialists |
Metrics
Measure task quality, abstention quality, p50 and p95 latency, cost per request, escalation rate, wrong-route rate, confidence calibration, and incident rate. Do not accept a router that raises accuracy slightly while making failures harder to explain.
Experiment pseudocode
FUNCTION compare_routing_strategies(task_stream, strategies, policy)
results <- []
FOR strategy IN strategies
replay <- REPLAY_TASK_STREAM(task_stream, strategy)
score <- SCORE_REPLAY(replay, policy.metrics)
results.APPEND({strategy: strategy.name, score: score})
END FOR
RETURN RANK_BY_NET_VIABILITY(results)
END FUNCTIONWrong-route analysis
Wrong-route cases are especially valuable because they teach the ecology where capability contracts are ambiguous. Each wrong route should be labeled as classification error, missing capability, insufficient evidence, overload fallback, or contract mismatch.
Cascade design
A cascade should save cost on easy cases and escalate hard cases. It fails when early stages are overconfident. Always measure the quality of abstention and escalation, not only final answers.
Coalition design
A coalition should be used when independent specialists add value. It fails when multiple models repeat the same error, when judge models are weak, or when the latency budget cannot support parallel inference.
Source reports used for this guide
These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.