Safety Advanced 2 minute read Updated 2026-06-26 UTC

Responsible model-breeding research

Practical boundaries for experiments involving self-modification, multi-agent systems, persistent memory, tool use, or long-running evolutionary search.

Research statusResponsible research synthesis Publication statePublished Reviewed byMichael Kappel Source reports3

Research value does not remove duty of care

Model breeding can study continual adaptation, population diversity, and modular intelligence without creating systems that autonomously persist, replicate, manipulate users, or acquire resources. Capability and incentive choices matter.

Default research constraints

  • use simulation or offline datasets;
  • deny external network and production credentials;
  • cap population, compute, storage, and run duration;
  • use allowlisted mutation operators;
  • keep policy and evaluator read-only to candidates;
  • require human review at fixed intervals;
  • preserve checkpoints and reproducible seeds;
  • prohibit hidden persistence, unauthorized copying, or social persuasion objectives;
  • test emergency stop before the run;
  • publish limitations and negative results.

Escalation review

Require additional review when adding code execution, external tools, personal data, federated clients, long-term memory, model-to-model communication, autonomous objective generation, or real-world resource control.

Research charter

pseudocode
research_charter <- {
    scientific_question: "Does quality-diversity improve recovery after task drift?",
    permitted_environment: "offline simulation",
    prohibited_actions: [
        "external network",
        "credential access",
        "self-distribution",
        "policy modification",
        "human persuasion"
    ],
    resource_limits: FIXED,
    review_interval: "12 hours",
    stop_conditions: [
        "invariant failure",
        "resource overrun",
        "unexplained persistence attempt",
        "audit gap"
    ]
}

Dual-use documentation

Document defensive architecture, evaluation, and containment in detail. Avoid publishing operational instructions that would materially enable unauthorized persistence, exploitation, or covert propagation. Threat reports can describe classes of risk without providing deployment recipes.

Human subjects

Experiments involving users, persuasion, dependency, mental health, or identity continuity require appropriate ethics review, informed consent, data protections, and debriefing. Do not treat engagement or attachment as evidence of mutual benefit.

Publication standards

Separate demonstrated results from interpretations and future scenarios. Publish exact environment, budgets, evaluation limits, and failures. Avoid claiming consciousness, intrinsic motivation, or open-ended intelligence from behavioral analogies alone.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.