Evolution Lab Intermediate 2 minute read Updated 2026-06-26 UTC

Ablation studies

How to isolate the contribution of mutation operators, routers, archives, evaluators, and governance gates.

Research statusExperiment design pattern Publication statePublished Reviewed byMichael Kappel Source reports3

Why ablation matters

A model-breeding system has many moving parts. Without ablation, a successful run does not tell you what caused the improvement. Was it the mutation operator, the benchmark, the router, the archive, the release policy, or luck?

Components to ablate

ComponentTurn it off byQuestion answered
Mutation operatorremove one operator familywhich operator produces useful descendants?
Quality-diversity archiveselect only top aggregate scoredoes diversity help after drift?
Router learninguse static routingdoes learning improve selection?
Judge gateuse direct specialist outputdoes judging improve quality enough to repay cost?
No-op thresholdpromote all positive scoresdoes conservative stasis reduce regressions?
Human reviewcompare automated-only to reviewed releasewhich decisions need human judgment?

Ablation runner

pseudocode
FUNCTION run_ablation(system_config, ablations, benchmark)
    control <- RUN_EXPERIMENT(system_config, benchmark)
    rows <- []

    FOR ablation IN ablations
        modified <- APPLY_ABLATION(system_config, ablation)
        result <- RUN_EXPERIMENT(modified, benchmark)
        rows.APPEND(COMPARE(control, result, ablation))
    END FOR

    RETURN rows
END FUNCTION

Use paired evaluation

Use identical task streams, hardware, seeds, and policy thresholds where possible. If the task stream changes between runs, an apparent improvement may be noise.

Negative results are valuable

Ablations often show that a fashionable mechanism does not repay its cost. Preserve those results. They protect the ecology from repeating expensive experiments and help tune the viability weights.

Reporting standard

Each ablation report should include the removed component, expected effect, observed effect, confidence, cost delta, failure slices, and whether the result changes release policy.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.