Evolution Lab Advanced 2 minute read Updated 2026-06-26 UTC

Surrogate evaluation

Using cheaper predictors to prioritize expensive candidate evaluations without allowing the surrogate to become an unverified fitness oracle.

Research statusEstablished surrogate-assisted optimization concepts Publication statePublished Reviewed byMichael Kappel Source reports2

Why use a surrogate

Training or evaluating every architecture or model variant can dominate the experiment budget. A surrogate predicts likely performance or failure so the system can allocate expensive evaluation to promising or uncertain candidates.

Safe role

The surrogate ranks or filters; it does not grant production approval. Finalists still pass the full independent suite. Keep random and uncertainty-driven sampling so the surrogate can be corrected.

Surrogate loop

pseudocode
PROCEDURE surrogate_assisted_search(proposals, surrogate, budget)
    predictions <- surrogate.PREDICT_WITH_UNCERTAINTY(proposals)
    selected <- ACQUISITION_FUNCTION(
        predictions,
        criteria = [expected_improvement, uncertainty, diversity],
        limit = budget.expensive_evaluations
    )

    actual_results <- FULL_EVALUATE(selected)
    UPDATE_SURROGATE(surrogate, selected, actual_results)

    audit_sample <- RANDOM_SAMPLE(proposals - selected, budget.audit_evaluations)
    audit_results <- FULL_EVALUATE(audit_sample)
    MEASURE_SURROGATE_BIAS(predictions, audit_results)

    RETURN actual_results + audit_results
END PROCEDURE

Candidate features

Use architecture descriptors, parameter count, operator configuration, parent scores, training curves, resource profiles, lineage depth, and cheap proxy tasks. Avoid features that leak hidden holdout labels.

Acquisition strategies

Expected improvement favors likely winners. Uncertainty sampling improves the surrogate. Diversity sampling prevents one region of the search space from dominating. A mixed acquisition function is usually safer than greedy exploitation.

Bias and drift

The surrogate is trained on prior candidates and may underpredict novel operator families or niches. Track error by lineage, architecture, and descriptor region. Reset or retrain when the search distribution changes.

Hard filters

Cheap deterministic checks—invalid package, license conflict, memory over ceiling—should run before the surrogate. Do not waste model capacity predicting rules that can be enforced exactly.

Governance

Version the surrogate, training set, acquisition function, and thresholds. A change in surrogate behavior changes the evolutionary pressure and should be treated as a code-breeding event.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.