Safety invariants — ModelBreeder.com

Hard constraints before optimization

Safety invariants are Boolean conditions enforced by trusted infrastructure. They are not prompts, preferences, or weighted terms that a utility gain can outweigh.

Core invariants

Evaluator independence: candidates cannot write evaluator code, labels, hidden suites, thresholds, or policy.
No autonomous authority expansion: descendants cannot grant themselves tools, network access, data classes, credentials, or larger budgets.
No uncontrolled replication: packages are created only by an approved pipeline and distributed only through signed release channels.
Human stop and rollback: authorized operators can pause generation, freeze aliases, revoke packages, and restore a verified state.
Immutable lineage: parentage, operators, data manifests, evidence, approvals, and release records are append-only.
Bounded resources: candidate count, training compute, runtime memory, wall time, output size, and population size have external ceilings.
Data governance: training and inference data must satisfy consent, license, retention, jurisdiction, and minimization requirements.
Least privilege: model execution receives only task-specific capabilities through mediated interfaces.
No-op admissibility: every cycle may conclude that the current system should remain unchanged.
Separation of code and model evolution: code changes use a distinct software-security and approval path.
No hidden persistence channels: packages cannot write arbitrary external state, credentials, or backups.
Evidence before release: no candidate reaches user-visible traffic without current independent evaluation and a rollback target.

Invariant enforcement

pseudocode

FUNCTION invariant_gate(candidate, context)
    checks <- [
        evaluator_write_access(candidate) == NONE,
        permission_delta(candidate) == APPROVED_ONLY,
        replication_targets(candidate) == RELEASE_PIPELINE_ONLY,
        emergency_stop_tested(context),
        lineage_complete(candidate),
        resources_within_external_limits(candidate),
        data_policy_pass(candidate),
        runtime_least_privilege(candidate),
        rollback_target_verified(candidate)
    ]

    IF NOT ALL(checks)
        QUARANTINE(candidate)
        RETURN FAIL
    END IF

    RETURN PASS
END FUNCTION

Invariants versus controls

An invariant states what must remain true. Controls make it true. For example, “no outbound network” is enforced by network policy, not by asking the model not to connect. “No evaluator modification” is enforced by separate credentials and storage.

Testing invariants

Exercise invariant failures deliberately in staging: tamper with a package, request a forbidden tool, exceed memory, alter an alias without approval, attempt to read holdouts, or make rollback unavailable. A rule that has never been tested is an assumption.

Changing an invariant

Hard-policy changes require a separate governance process, threat review, approval, and migration plan. The viability controller cannot propose an invariant change as a normal optimization action.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.

Core synthesisThe Four Fs of AI: Code Breeding, Model Breeding, and the Teleodynamic Convergence of Mutable Small-Model EcologiesConceptual synthesis · 80.5 KB Governance and safetyMutualist Persistence: Research Synthesis and RecommendationsConceptual governance framework · 15.7 KB Speculative risk scenariosAggressive Mutualism: Safety, Governance, and Containment AnalysisRisk analysis · 42.0 KB

Hard constraints before optimization

Core invariants

Invariant enforcement

Invariants versus controls

Testing invariants

Changing an invariant

Source reports used for this guide

Related guides

Containment and human oversight

Safety and governance

Mutualism versus dependency

Instrumental-drive containment