Reference Intermediate 3 minute read Updated 2026-06-26 UTC

Model evolution decision guide

Decide when to route, ensemble, fine-tune, use adapters, distill, quantize, prune, merge, split, retire, or choose no-op.

Research statusEngineering decision framework Publication statePublished Reviewed byMichael Kappel Source reports2

Begin with the failure signature

Do not begin by choosing an operator. Begin by classifying the failure: missing knowledge, weak representation, routing error, calibration error, runtime bottleneck, contract mismatch, policy gap, or temporary noise.

Observed needFirst optionEscalate toAvoid initially
Stable task nicheRoute to an existing specialistTrain an adapter or specialistWeight merge without compatibility evidence
Missing current factsRetrieval or deterministic data sourceFine-tuning for stable domain patternsEncoding volatile facts into weights
Repeated narrow errorsPrompt/rule/schema fixAdapter or targeted fine-tuneFull retraining
Excess latencyCaching, batching, route changeQuantization or distillationPruning without slice evaluation
Excess memoryOn-demand loadingQuantization, adapter design, distillationLoading more specialists permanently
Correlated model errorsIncrease data/evaluator diversityIndependent ensembleMajority vote among near-identical descendants
New device tierRuntime-compatible quantizationDistilled device specialistSilent quality downgrade
Two compatible complementary parentsOutput ensemble baselineControlled merge experimentProduction merge by intuition
Redundant specialistsRouter consolidationMerge, distill, or retireKeeping all descendants “just in case”
Unstable or sparse evidenceCollect dataShadow experimentStructural change

Route before retraining

Routing preserves parent artifacts and isolates changes. If a suitable specialist already exists, changing the router is normally cheaper and more reversible than changing weights. However, route changes still need replay evaluation because traffic redistribution changes every specialist’s observed distribution.

Adapters fit when a shared base is acceptable and skills require relatively small parameter changes. They are less suitable when skills need different tokenizers, context behavior, safety regimes, or runtime families.

Distill when the target contract is narrower

Distillation is strongest when the student serves a well-defined subset of teacher behavior. Evaluate whether teacher errors are copied, whether rare cases survive, and whether the student’s calibration changes.

Quantize for deployment, not as a free optimization

Quantization creates a new descendant. Measure target hardware, critical slices, long sequences, numerical edge cases, and concurrency. Preserve the unquantized parent and exact quantization recipe.

Merge only with a baseline that can win

Always compare a weight merge with both parents, a router, and an output ensemble. A merge is valuable only when consolidation benefits exceed interference risk and evaluation cost.

Split when one package contains conflicting niches

Evidence for splitting includes bimodal error patterns, incompatible latency tiers, separate data jurisdictions, or improvements in one niche causing regression in another. A split adds orchestration and population cost, so require stable clusters.

Retire when marginal value falls below maintenance cost

Retirement candidates are unused, redundant, unsupported, contaminated, consistently dominated, or no longer valid under policy. Remove routing eligibility first, drain active leases, preserve lineage, and retain required artifacts or tombstones.

Choose no-op when evidence is weak

pseudocode
FUNCTION choose_operator(failure, evidence, catalog, policy)
    IF evidence.is_transient OR evidence.sample_size < policy.minimum_sample
        RETURN NO_OP_AND_COLLECT_MORE_EVIDENCE
    END IF

    IF failure.type == "routing" AND catalog.has_eligible_specialist
        RETURN ROUTER_CHANGE
    END IF

    IF failure.type == "volatile_knowledge"
        RETURN RETRIEVAL_OR_DATA_SOURCE_CHANGE
    END IF

    options <- GENERATE_OPERATOR_OPTIONS(failure, catalog)
    eligible <- FILTER_BY_COMPATIBILITY_SAFETY_AND_BUDGET(options)
    RETURN LOWEST_COMPLEXITY_OPTION_WITH_TESTABLE_HYPOTHESIS(eligible)
END FUNCTION

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.