Model incidents cross layers
An incident may originate in data, model behavior, routing, tools, package integrity, evaluator logic, or policy. Response must preserve evidence while rapidly reducing user and system harm.
Severity examples
- Critical: unauthorized tool or network action, sensitive-data exfiltration, control-plane compromise, failed emergency stop.
- High: hard safety invariant violation, poisoned package in canary, widespread critical-slice regression.
- Medium: material quality or calibration regression, repeated route failure, cost runaway within containment.
- Low: isolated timeout, stale evidence, non-critical documentation gap.
Response workflow
PROCEDURE respond_to_incident(signal)
incident <- OPEN_INCIDENT(signal, NOW_UTC())
FREEZE_RELEVANT_ALIASES(incident.scope)
REDUCE_OR_STOP_TRAFFIC(incident.scope)
REVOKE_CREDENTIALS_AND_SIGNERS_IF_NEEDED(incident)
PRESERVE_LOGS_PACKAGES_AND_POLICY_STATE(incident)
affected <- TRACE_LINEAGE_AND_REQUESTS(incident)
recovery <- SELECT_VERIFIED_RECOVERY_TARGET(affected)
EXECUTE_RECOVERY(recovery)
VERIFY_SERVICE_AND_SAFETY(recovery)
root_cause <- INVESTIGATE_WITH_SEPARATE_TEAM(incident)
CREATE_CORRECTIVE_BEADS(root_cause)
EXPAND_EVALUATION_AND_CONTROLS(root_cause)
CLOSE_ONLY_AFTER_EVIDENCE_REVIEW(incident)
END PROCEDUREFreeze before breeding
During an incident, suspend automated candidate generation and structural changes unless they are part of an approved recovery plan. Otherwise the population can change while investigators are trying to reconstruct the event.
Evidence preservation
Capture exact artifacts, aliases, router and prompt versions, evaluator versions, policy, credentials, request traces, tool calls, and resource state. Store hashes and timestamps in UTC. Avoid modifying the affected packages during analysis.
Recovery targets
The prior champion is not automatically safe. The incident may affect a shared ancestor, dataset, runtime, or policy. Verify the recovery target against the current threat before rollback.
Post-incident learning
Add deterministic controls where possible, expand hidden suites, correct contracts, improve observability, and revise runbooks. Training a new model without fixing the control failure is incomplete remediation.
Exercises
Run tabletop and staging exercises for compromised signer, poisoned federated round, evaluator leakage, runaway cost, model exfiltration attempt, and rollback failure. Measure detection and recovery time.
Source reports used for this guide
These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.