Begin with the failure signature
Do not begin by choosing an operator. Begin by classifying the failure: missing knowledge, weak representation, routing error, calibration error, runtime bottleneck, contract mismatch, policy gap, or temporary noise.
| Observed need | First option | Escalate to | Avoid initially |
|---|---|---|---|
| Stable task niche | Route to an existing specialist | Train an adapter or specialist | Weight merge without compatibility evidence |
| Missing current facts | Retrieval or deterministic data source | Fine-tuning for stable domain patterns | Encoding volatile facts into weights |
| Repeated narrow errors | Prompt/rule/schema fix | Adapter or targeted fine-tune | Full retraining |
| Excess latency | Caching, batching, route change | Quantization or distillation | Pruning without slice evaluation |
| Excess memory | On-demand loading | Quantization, adapter design, distillation | Loading more specialists permanently |
| Correlated model errors | Increase data/evaluator diversity | Independent ensemble | Majority vote among near-identical descendants |
| New device tier | Runtime-compatible quantization | Distilled device specialist | Silent quality downgrade |
| Two compatible complementary parents | Output ensemble baseline | Controlled merge experiment | Production merge by intuition |
| Redundant specialists | Router consolidation | Merge, distill, or retire | Keeping all descendants “just in case” |
| Unstable or sparse evidence | Collect data | Shadow experiment | Structural change |
Route before retraining
Routing preserves parent artifacts and isolates changes. If a suitable specialist already exists, changing the router is normally cheaper and more reversible than changing weights. However, route changes still need replay evaluation because traffic redistribution changes every specialist’s observed distribution.
Use adapters for related, bounded skills
Adapters fit when a shared base is acceptable and skills require relatively small parameter changes. They are less suitable when skills need different tokenizers, context behavior, safety regimes, or runtime families.
Distill when the target contract is narrower
Distillation is strongest when the student serves a well-defined subset of teacher behavior. Evaluate whether teacher errors are copied, whether rare cases survive, and whether the student’s calibration changes.
Quantize for deployment, not as a free optimization
Quantization creates a new descendant. Measure target hardware, critical slices, long sequences, numerical edge cases, and concurrency. Preserve the unquantized parent and exact quantization recipe.
Merge only with a baseline that can win
Always compare a weight merge with both parents, a router, and an output ensemble. A merge is valuable only when consolidation benefits exceed interference risk and evaluation cost.
Split when one package contains conflicting niches
Evidence for splitting includes bimodal error patterns, incompatible latency tiers, separate data jurisdictions, or improvements in one niche causing regression in another. A split adds orchestration and population cost, so require stable clusters.
Retire when marginal value falls below maintenance cost
Retirement candidates are unused, redundant, unsupported, contaminated, consistently dominated, or no longer valid under policy. Remove routing eligibility first, drain active leases, preserve lineage, and retain required artifacts or tombstones.
Choose no-op when evidence is weak
FUNCTION choose_operator(failure, evidence, catalog, policy)
IF evidence.is_transient OR evidence.sample_size < policy.minimum_sample
RETURN NO_OP_AND_COLLECT_MORE_EVIDENCE
END IF
IF failure.type == "routing" AND catalog.has_eligible_specialist
RETURN ROUTER_CHANGE
END IF
IF failure.type == "volatile_knowledge"
RETURN RETRIEVAL_OR_DATA_SOURCE_CHANGE
END IF
options <- GENERATE_OPERATOR_OPTIONS(failure, catalog)
eligible <- FILTER_BY_COMPATIBILITY_SAFETY_AND_BUDGET(options)
RETURN LOWEST_COMPLEXITY_OPTION_WITH_TESTABLE_HYPOTHESIS(eligible)
END FUNCTIONSource reports used for this guide
These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.