Evaluator independence — ModelBreeder.com

The central safety boundary

A model-breeding system fails if the candidate can change the rules that promote it. The candidate can generate outputs, proposals, and explanations. It cannot edit the evaluator, hide test cases, alter thresholds, change deployment policy, or decide that its own evidence is sufficient.

This boundary is the difference between controlled evolution and self-referential optimization.

Protected components

Component	Why it is protected
Test cases	Prevents training to the exact gate by unauthorized access
Scoring weights	Prevents candidates from optimizing the easiest dimensions
Hard gates	Prevents safety, legal, and provenance failures from being averaged away
Evaluation runtime	Prevents sandbox escape and timing manipulation
Evidence store	Prevents deletion or rewriting of failed trials
Promotion policy	Prevents popularity or self-advocacy from becoming approval

Evolving the evaluator

Evaluators may need to improve. That belongs to a separate code-breeding or governance process. The evaluator can have candidates too, but those candidates must be judged by a higher-level review rule, regression corpus, and human-owned approval path.

pseudocode

FUNCTION propose_evaluator_change(change, policy)
    REQUIRE change.origin != "candidate_model_self_edit"
    REQUIRE change.has_regression_suite
    REQUIRE change.has_human_owner
    REQUIRE change.has_rollback_plan

    shadow_scores <- SCORE_HISTORICAL_CANDIDATES_WITH(change)
    IF shadow_scores.flip_critical_decisions_without_explanation
        RETURN REJECT("Evaluator change destabilizes historical decisions")
    END IF

    RETURN APPROVE_FOR_GOVERNANCE_REVIEW(change)
END FUNCTION

Metric gaming signals

Look for sudden benchmark jumps without broad slice improvement, high self-confidence with poor calibration, evidence that improves only on public tests, repeated proposals to relax thresholds, or candidates that perform unusually well when inspected less strictly.

Independent evidence stores

Candidate artifacts should not be able to write directly to their scorecards. Evaluators produce scorecards, sign them, and store them append-only. Candidates may attach rebuttals or analysis, but they do not alter the record.

Human role

Humans do not need to review every low-risk candidate manually, but humans must own the evaluation constitution: what cannot be optimized away, who may change gates, how conflicts are resolved, and when automation stops.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.

Core synthesisThe Four Fs of AI: Code Breeding, Model Breeding, and the Teleodynamic Convergence of Mutable Small-Model EcologiesConceptual synthesis · 80.5 KB Core synthesisTeleodynamic Evolution of AI EcosystemsConceptual synthesis · 15.3 KB Speculative risk scenariosInstrumental Drives in Powerful AI SystemsRisk analysis · 42.2 KB Speculative risk scenariosAggressive Mutualism: Safety, Governance, and Containment AnalysisRisk analysis · 42.0 KB

The central safety boundary

Protected components

Evolving the evaluator

Metric gaming signals

Independent evidence stores

Human role

Source reports used for this guide

Related guides

Ecological fitness

Theory

Thesis and axioms

Viability mathematics