Distillation as breeding
Distillation creates a descendant that approximates a teacher's behavior within a target domain. It is one of the most practical model-breeding operators because it can convert an expensive coalition or generalist into a deployable specialist.
Distillation pipeline
- Define the specialist contract and explicit non-goals.
- Select teacher models and judge logic.
- Build a seed dataset from licensed, representative examples.
- Generate candidate labels or rationales under quality gates.
- Filter disagreement, policy violations, duplicates, and low-confidence cases.
- Train one or more students under different capacity and recipe settings.
- Evaluate on independent human- or rule-labeled holdouts.
- Compare against the teacher, champion, and a simple baseline.
- Package lineage linking every synthetic example to teacher outputs and filters.
PROCEDURE distill_specialist(contract, teachers, seed_inputs)
accepted_examples <- []
FOR each input IN seed_inputs
teacher_outputs <- RUN_INDEPENDENTLY(teachers, input)
verdict <- JUDGE_TEACHER_OUTPUTS(teacher_outputs, contract)
IF verdict.accepted AND verdict.confidence >= label_threshold
APPEND accepted_examples, {
input: input,
label: verdict.output,
teacher_ids: teachers.ids,
verdict_id: verdict.id
}
END IF
END FOR
students <- TRAIN_BOUNDED_STUDENT_VARIANTS(accepted_examples, contract)
RETURN INDEPENDENTLY_EVALUATE(students)
END PROCEDURESpecialization boundaries
A specialist should abstain outside its niche. Make the niche machine-readable: taxonomy, language, data class, hardware, latency class, and confidence range. Routing should prefer abstention over confident misuse.
Teacher contamination
Synthetic data can magnify teacher biases, factual errors, style artifacts, and policy gaps. Preserve a human- or deterministic-label holdout. Include disagreement cases rather than filtering every difficult example, because uncertainty boundaries are part of the capability.
Capacity selection
Train multiple student sizes. The smallest student that satisfies the contract may offer the best viability even if a larger student wins a generic benchmark. Measure total accepted-result cost, including fallbacks and corrections.
Continual specialization
Do not continuously fine-tune a production artifact in place. Accumulate new evidence, create a descendant, evaluate it, and promote through the release pipeline. This preserves rollback and makes drift attributable.
Source reports used for this guide
These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.