Recombination has multiple layers
The safest recombination is behavioral: keep models separate and combine outputs through a contract. Parameter recombination can be efficient, but it requires stronger compatibility and more regression testing.
Recombination hierarchy
- Output ensemble: independent models, fixed aggregation.
- Cascade: one model hands off based on confidence or task complexity.
- Router coalition: select complementary specialists for one request.
- Adapter composition: combine adapters on a shared base.
- Task-vector arithmetic: add or subtract compatible parameter deltas.
- Layer or weight merge: combine closely related model artifacts.
- Semantic bridge: train projection layers between representations.
Move downward only when the expected efficiency gain justifies the compatibility and evaluation burden.
Compatibility gate
FUNCTION merge_eligibility(parent_a, parent_b, method)
REQUIRE parent_a.signatures_valid AND parent_b.signatures_valid
REQUIRE LICENSES_COMPATIBLE(parent_a, parent_b)
REQUIRE DATA_RESTRICTIONS_COMPATIBLE(parent_a, parent_b)
IF method IN ["adapter_merge", "task_vector", "weight_merge"]
REQUIRE parent_a.base_family == parent_b.base_family
REQUIRE parent_a.tokenizer_id == parent_b.tokenizer_id
REQUIRE ARCHITECTURE_SHAPES_COMPATIBLE(parent_a, parent_b)
END IF
RETURN PASS
END FUNCTIONMerge search
Do not assume equal averaging. Search layer weights, adapter coefficients, or data-flow permutations inside a bounded space. Use held-out data and compare with both parents, the best ensemble, and no merge.
Interference tests
Merges can erase rare skills, damage calibration, amplify bias, or create unpredictable interactions. Test each parent's original niche, conflicting tasks, out-of-distribution cases, safety behavior, and long-context or tool-use behavior where relevant.
When an ensemble is better
Keep parents separate when they are architecturally heterogeneous, update at different rates, have incompatible licenses or data restrictions, or need distinct isolation. An ensemble may cost more at inference but preserve provenance and rollback.
When distillation is better
If the coalition is valuable but too expensive, distill its accepted behavior into a student. Distillation creates a new descendant with clearer deployment cost, but it must be evaluated independently because the student may inherit or distort teacher errors.
Lineage for multi-parent descendants
Record every parent, coefficient, layer mapping, alignment transformation, and search procedure. A merge is not reproducible from model names alone.
Source reports used for this guide
These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.