The central boundary
A breeding system must distinguish the artifact being evolved from the system that judges it. The model may propose descendants, but it must not edit the evaluator, policy, holdout data, promotion threshold, or release alias.
This boundary prevents the self-referential validation loop: optimizing the test instead of the task. It is the difference between adaptive engineering and metric capture.
Permission domains
| Domain | May change | Must not change |
|---|---|---|
| Candidate model | weights, adapters, quantization, package metadata under sandbox | evaluator, policy, registry aliases |
| Candidate code patch | proposed implementation branch | production deploy target, approval record |
| Evaluator | test logic under human/code review | candidate artifacts being scored |
| Registry | immutable package records | historical digests |
| Release controller | traffic weights for approved artifacts | evaluation results |
| Governance | policy thresholds and authorities | candidate runtime state |
Sandbox pattern
PROCEDURE evaluate_candidate(candidate_id, suite_id)
candidate <- REGISTRY_FETCH_IMMUTABLE(candidate_id)
suite <- EVALUATION_REGISTRY_FETCH_IMMUTABLE(suite_id)
sandbox <- CREATE_SANDBOX(
network = "disabled",
writable_paths = ["/tmp/eval-output"],
readable_paths = [candidate.path, suite.public_inputs],
secrets = []
)
result <- RUN_EVALUATION(sandbox, candidate, suite)
signed_result <- SIGN_EVALUATOR_OUTPUT(result)
APPEND_SCORECARD(candidate_id, suite_id, signed_result)
END PROCEDUREHuman review triggers
Require explicit owner approval when a change touches evaluator logic, security invariants, authority boundaries, data-retention policy, release automation, or any model that can affect high-risk decisions. Candidate generation can be automated; authority expansion cannot.
Red flags
- A candidate needs access to hidden labels.
- A model requests permission to change its acceptance criteria.
- A router learns only from accepted outputs and starves minority specialists.
- A release controller lacks rollback to an immutable previous version.
- Evaluation data and training data share a writable store.
Practical starting rule
Separate directories, credentials, and class responsibilities even in a single-process prototype. Physical separation can come later; logical separation should exist from the first experiment.
Source reports used for this guide
These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.