Model evolution decision guide

Begin with the failure signature

Do not begin by choosing an operator. Begin by classifying the failure: missing knowledge, weak representation, routing error, calibration error, runtime bottleneck, contract mismatch, policy gap, or temporary noise.

Observed need	First option	Escalate to	Avoid initially
Stable task niche	Route to an existing specialist	Train an adapter or specialist	Weight merge without compatibility evidence
Missing current facts	Retrieval or deterministic data source	Fine-tuning for stable domain patterns	Encoding volatile facts into weights
Repeated narrow errors	Prompt/rule/schema fix	Adapter or targeted fine-tune	Full retraining
Excess latency	Caching, batching, route change	Quantization or distillation	Pruning without slice evaluation
Excess memory	On-demand loading	Quantization, adapter design, distillation	Loading more specialists permanently
Correlated model errors	Increase data/evaluator diversity	Independent ensemble	Majority vote among near-identical descendants
New device tier	Runtime-compatible quantization	Distilled device specialist	Silent quality downgrade
Two compatible complementary parents	Output ensemble baseline	Controlled merge experiment	Production merge by intuition
Redundant specialists	Router consolidation	Merge, distill, or retire	Keeping all descendants “just in case”
Unstable or sparse evidence	Collect data	Shadow experiment	Structural change

Route before retraining

Routing preserves parent artifacts and isolates changes. If a suitable specialist already exists, changing the router is normally cheaper and more reversible than changing weights. However, route changes still need replay evaluation because traffic redistribution changes every specialist’s observed distribution.

Adapters fit when a shared base is acceptable and skills require relatively small parameter changes. They are less suitable when skills need different tokenizers, context behavior, safety regimes, or runtime families.

Distill when the target contract is narrower

Distillation is strongest when the student serves a well-defined subset of teacher behavior. Evaluate whether teacher errors are copied, whether rare cases survive, and whether the student’s calibration changes.

Quantize for deployment, not as a free optimization

Quantization creates a new descendant. Measure target hardware, critical slices, long sequences, numerical edge cases, and concurrency. Preserve the unquantized parent and exact quantization recipe.

Merge only with a baseline that can win

Always compare a weight merge with both parents, a router, and an output ensemble. A merge is valuable only when consolidation benefits exceed interference risk and evaluation cost.

Split when one package contains conflicting niches

Evidence for splitting includes bimodal error patterns, incompatible latency tiers, separate data jurisdictions, or improvements in one niche causing regression in another. A split adds orchestration and population cost, so require stable clusters.

Retire when marginal value falls below maintenance cost

Retirement candidates are unused, redundant, unsupported, contaminated, consistently dominated, or no longer valid under policy. Remove routing eligibility first, drain active leases, preserve lineage, and retain required artifacts or tombstones.

Choose no-op when evidence is weak

pseudocode

FUNCTION choose_operator(failure, evidence, catalog, policy)
    IF evidence.is_transient OR evidence.sample_size < policy.minimum_sample
        RETURN NO_OP_AND_COLLECT_MORE_EVIDENCE
    END IF

    IF failure.type == "routing" AND catalog.has_eligible_specialist
        RETURN ROUTER_CHANGE
    END IF

    IF failure.type == "volatile_knowledge"
        RETURN RETRIEVAL_OR_DATA_SOURCE_CHANGE
    END IF

    options <- GENERATE_OPERATOR_OPTIONS(failure, catalog)
    eligible <- FILTER_BY_COMPATIBILITY_SAFETY_AND_BUDGET(options)
    RETURN LOWEST_COMPLEXITY_OPTION_WITH_TESTABLE_HYPOTHESIS(eligible)
END FUNCTION

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.

Core synthesisThe Four Fs of AI: Code Breeding, Model Breeding, and the Teleodynamic Convergence of Mutable Small-Model EcologiesConceptual synthesis · 80.5 KB Core synthesisTeleodynamic Evolution of AI EcosystemsConceptual synthesis · 15.3 KB

Begin with the failure signature

Route before retraining

Use adapters for related, bounded skills

Distill when the target contract is narrower

Quantize for deployment, not as a free optimization

Merge only with a baseline that can win

Split when one package contains conflicting niches

Retire when marginal value falls below maintenance cost

Choose no-op when evidence is weak

Source reports used for this guide

Related guides

Operator catalog

Reference library

Glossary

Pseudocode cookbook