Distillation and specialization

Distillation as breeding

Distillation creates a descendant that approximates a teacher's behavior within a target domain. It is one of the most practical model-breeding operators because it can convert an expensive coalition or generalist into a deployable specialist.

Distillation pipeline

Define the specialist contract and explicit non-goals.
Select teacher models and judge logic.
Build a seed dataset from licensed, representative examples.
Generate candidate labels or rationales under quality gates.
Filter disagreement, policy violations, duplicates, and low-confidence cases.
Train one or more students under different capacity and recipe settings.
Evaluate on independent human- or rule-labeled holdouts.
Compare against the teacher, champion, and a simple baseline.
Package lineage linking every synthetic example to teacher outputs and filters.

pseudocode

PROCEDURE distill_specialist(contract, teachers, seed_inputs)
    accepted_examples <- []

    FOR each input IN seed_inputs
        teacher_outputs <- RUN_INDEPENDENTLY(teachers, input)
        verdict <- JUDGE_TEACHER_OUTPUTS(teacher_outputs, contract)

        IF verdict.accepted AND verdict.confidence >= label_threshold
            APPEND accepted_examples, {
                input: input,
                label: verdict.output,
                teacher_ids: teachers.ids,
                verdict_id: verdict.id
            }
        END IF
    END FOR

    students <- TRAIN_BOUNDED_STUDENT_VARIANTS(accepted_examples, contract)
    RETURN INDEPENDENTLY_EVALUATE(students)
END PROCEDURE

Specialization boundaries

A specialist should abstain outside its niche. Make the niche machine-readable: taxonomy, language, data class, hardware, latency class, and confidence range. Routing should prefer abstention over confident misuse.

Teacher contamination

Synthetic data can magnify teacher biases, factual errors, style artifacts, and policy gaps. Preserve a human- or deterministic-label holdout. Include disagreement cases rather than filtering every difficult example, because uncertainty boundaries are part of the capability.

Capacity selection

Train multiple student sizes. The smallest student that satisfies the contract may offer the best viability even if a larger student wins a generic benchmark. Measure total accepted-result cost, including fallbacks and corrections.

Continual specialization

Do not continuously fine-tune a production artifact in place. Accumulate new evidence, create a descendant, evaluate it, and promote through the release pipeline. This preserves rollback and makes drift attributable.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.

Core synthesisThe 4Fs Framework: Fast, Flexible, Frugal, FederatedEmerging practice · 22.5 KB Core synthesisThe Four Fs of AI: Code Breeding, Model Breeding, and the Teleodynamic Convergence of Mutable Small-Model EcologiesConceptual synthesis · 80.5 KB

Distillation as breeding

Distillation pipeline

Specialization boundaries

Teacher contamination

Capacity selection

Continual specialization

Source reports used for this guide

Related guides

Evolution lab

Core evolutionary loop

Evolutionary operators catalog

Mutation operators