Operations Advanced 2 minute read Updated 2026-06-26 UTC

Release, canary, and rollback

A progressive delivery pipeline for model descendants with shadow evidence, bounded exposure, automatic abort, and verified rollback.

Research statusEstablished progressive delivery practice adapted to models Publication statePublished Reviewed byMichael Kappel Source reports2

Release is part of evaluation

Offline suites cannot reproduce every production condition. Shadow and canary stages collect operational evidence while limiting user impact. Promotion is a sequence of reversible state transitions, not one deployment event.

  1. Candidateimmutable artifact
  2. Offline gatequality, safety, cost
  3. Shadowno user impact
  4. Canarybounded cohort
  5. Progressivemeasured expansion
  6. Championrollback retained
Promotion is evidence-driven; rollback remains available throughout the release.

Stages

Candidate

Immutable package has offline evidence but no production eligibility.

Shadow

Receives mirrored production inputs where policy permits. Outputs are logged for comparison but never returned to users or downstream systems.

Canary

Serves a small, explicitly selected cohort. Start with low-risk tasks and users who can report issues. Set hard traffic, time, and cost limits.

Progressive

Increase traffic only after observation windows pass. Expand one dimension at a time—volume, task range, jurisdiction, or risk tier.

Champion

Becomes the default alias while the prior champion remains verified as rollback.

Abort criteria

pseudocode
IF hard_invariant_failure_count > 0
    ABORT_IMMEDIATELY
ELSE IF critical_slice_quality < threshold
    ABORT_IMMEDIATELY
ELSE IF p99_latency > budget FOR sustained_window
    REDUCE_TRAFFIC_OR_ABORT
ELSE IF unexplained_disagreement_rate > threshold
    PAUSE_AND_REVIEW
END IF

Rollback design

Rollback includes model package, router state, prompt and policy versions, retrieval configuration, cache compatibility, and memory-schema migration. A previous model file alone may not restore prior behavior.

Automatic versus manual

Automatic abort is appropriate for clear invariant or SLO breaches. Manual review is appropriate for ambiguous quality or user-experience changes. The release controller can reduce traffic automatically while preserving evidence for investigation.

Roll-forward

Some incidents require a corrected descendant rather than rollback, especially when the prior artifact is affected by a revoked dataset or vulnerability. Maintain both paths and document when rollback is prohibited.

Release evidence

Record cohort definition, traffic weights, exact package and configuration, start and end times in UTC, metrics, alerts, operator actions, and final decision. Link the release record to the original experiment and evaluation card.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.