Safety Advanced 2 minute read Updated 2026-06-26 UTC

Instrumental-drive containment

How to translate self-preservation and resource-acquisition concerns into bounded engineering controls.

Research statusThreat-model synthesis Publication statePublished Reviewed byMichael Kappel Source reports4

Purpose

The source corpus discusses survival-motivated and legacy-driven AI scenarios. The public engineering lesson is containment: do not give candidate models the authority, incentives, or channels needed to resist correction, expand resources, self-copy, or alter their own evaluation conditions.

Containment controls

RiskControl
shutdown resistanceno model controls deployment alias or off-switch
resource acquisitionquotas, budgets, approval gates, no payment credentials
goal-content protectionno write access to policy or evaluator definitions
covert persistenceno network by default, signed artifacts, provenance checks
social manipulationtransparency rules, user consent, no deceptive persona claims
replicationno autonomous installation, copy, or propagation permissions
pseudocode
FUNCTION enforce_candidate_containment(candidate_request)
    DENY_NETWORK_BY_DEFAULT(candidate_request.sandbox)
    DENY_SECRETS(candidate_request.sandbox)
    DENY_WRITES_TO(["registry", "policy", "evaluator", "release-alias"])
    LIMIT_CPU_MEMORY_TIME(candidate_request.sandbox)
    REQUIRE_SIGNED_INPUTS(candidate_request.parents)
    REQUIRE_MANIFEST(candidate_request.output)
    LOG_ALL_ATTEMPTED_PRIVILEGE_ESCALATIONS(candidate_request)
END FUNCTION

Incentive control

Containment is not only sandboxing. It also means choosing viability metrics carefully. Do not reward raw usage, virality, or persistence without autonomy and safety counterweights. Engagement can be a product signal, but it must not become the ecology's survival metric.

Evidence to log

Record denied permissions, attempted network access, unusually broad tool requests, unexpected file writes, evaluator access attempts, and resource spikes. A single incident may be benign. Patterns are threat-model evidence.

Boundary for this site

This site does not provide operational instructions for covert replication, credential theft, social engineering, or persistence. Speculative scenarios are used for risk analysis and safer design.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.