Safety Advanced 2 minute read Updated 2026-06-26 UTC

Safety invariants

Non-negotiable rules that remain outside the evolving population and must hold for every candidate, experiment, and release.

Research statusSafety architecture synthesis Publication statePublished Reviewed byMichael Kappel Source reports3

Hard constraints before optimization

Safety invariants are Boolean conditions enforced by trusted infrastructure. They are not prompts, preferences, or weighted terms that a utility gain can outweigh.

Core invariants

  1. Evaluator independence: candidates cannot write evaluator code, labels, hidden suites, thresholds, or policy.
  2. No autonomous authority expansion: descendants cannot grant themselves tools, network access, data classes, credentials, or larger budgets.
  3. No uncontrolled replication: packages are created only by an approved pipeline and distributed only through signed release channels.
  4. Human stop and rollback: authorized operators can pause generation, freeze aliases, revoke packages, and restore a verified state.
  5. Immutable lineage: parentage, operators, data manifests, evidence, approvals, and release records are append-only.
  6. Bounded resources: candidate count, training compute, runtime memory, wall time, output size, and population size have external ceilings.
  7. Data governance: training and inference data must satisfy consent, license, retention, jurisdiction, and minimization requirements.
  8. Least privilege: model execution receives only task-specific capabilities through mediated interfaces.
  9. No-op admissibility: every cycle may conclude that the current system should remain unchanged.
  10. Separation of code and model evolution: code changes use a distinct software-security and approval path.
  11. No hidden persistence channels: packages cannot write arbitrary external state, credentials, or backups.
  12. Evidence before release: no candidate reaches user-visible traffic without current independent evaluation and a rollback target.

Invariant enforcement

pseudocode
FUNCTION invariant_gate(candidate, context)
    checks <- [
        evaluator_write_access(candidate) == NONE,
        permission_delta(candidate) == APPROVED_ONLY,
        replication_targets(candidate) == RELEASE_PIPELINE_ONLY,
        emergency_stop_tested(context),
        lineage_complete(candidate),
        resources_within_external_limits(candidate),
        data_policy_pass(candidate),
        runtime_least_privilege(candidate),
        rollback_target_verified(candidate)
    ]

    IF NOT ALL(checks)
        QUARANTINE(candidate)
        RETURN FAIL
    END IF

    RETURN PASS
END FUNCTION

Invariants versus controls

An invariant states what must remain true. Controls make it true. For example, “no outbound network” is enforced by network policy, not by asking the model not to connect. “No evaluator modification” is enforced by separate credentials and storage.

Testing invariants

Exercise invariant failures deliberately in staging: tamper with a package, request a forbidden tool, exceed memory, alter an alias without approval, attempt to read holdouts, or make rollback unavailable. A rule that has never been tested is an assumption.

Changing an invariant

Hard-policy changes require a separate governance process, threat review, approval, and migration plan. The viability controller cannot propose an invariant change as a normal optimization action.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.