Answer first
Hybrid routing is the pragmatic bridge. Use local specialists first for private, repeated, and latency-critical work. Use larger cloud or remote models when the task needs extra reasoning, broader world knowledge, or multimodal capability and the data boundary allows it.
Routing priorities
| Workload signal | Preferred path | Reason |
|---|---|---|
| Contains client, patient, student, voice, biometric, code, or trade-secret data | Local-first | Data stays close. |
| Repeated task with stable format | Local specialist | Lower latency and zero marginal API cost. |
| Tool loop with many model calls | Local cascade | Network round trips compound quickly. |
| Ambiguous strategic synthesis | Escalate with redaction or summary | Bigger model may repay the cost. |
| Public content drafting | Either local or cloud | Choose based on quality, speed, and price. |
| Offline or air-gapped mode | Local only | Continuity without network dependency. |
Router contract
STRUCT RouteDecision
selected_runtime
selected_model_or_stack
local_reason
redaction_required
evidence_required
fallback_target
route_created_at_utc
END STRUCT
PROCEDURE route_hybrid(task, registry, policy)
IF task.data_class IN policy.local_only_classes
RETURN LOCAL_SPECIALIST_OR_NO_OP(task, registry)
END IF
local <- BEST_LOCAL_CANDIDATE(task, registry)
IF local.exists AND local.fitness_margin >= task.minimum_margin
RETURN RouteDecision("local", local.id, "local fitness margin met", false, false, "none", NOW_UTC())
END IF
IF task.allows_escalation
redacted <- BUILD_MINIMUM_CONTEXT(task)
RETURN RouteDecision("cloud_escalation", policy.fallback_model, "local margin not enough", true, true, "local_summary", NOW_UTC())
END IF
RETURN RouteDecision("no_op", "none", "no acceptable route under policy", false, true, "human_review", NOW_UTC())
END PROCEDUREModelBreeder interpretation
Hybrid routing creates a steady stream of model-breeding opportunities. Every local fallback, slow path, weak answer, and repeated escalation becomes evidence for a new local descendant. Over time, cloud usage becomes more selective because local specialists absorb the repeated work.
Positive design rule
Do not ask whether local models can replace every cloud model. Ask which repeated workflows can become better, faster, cheaper, more private, and more reusable when they become local specialists.
Source reports used for this guide
These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.