Research program — ModelBreeder.com

Goal

The theory should improve through experiments, not through stronger metaphors. This research program identifies evidence that would make the ModelBreeder framework more precise, falsifiable, and useful.

Experiment 1: marginal specialist value

Compare a monolithic baseline, a champion model, and a population of small specialists under identical task mixes. Measure utility, latency, memory, cost, calibration, and failure correlation. The key question is whether population composition improves net viability under realistic constraints.

Experiment 2: no-op threshold calibration

Generate candidate descendants across a range of mutation operators. Compare aggressive promotion, conservative promotion, and explicit no-op policies. Measure regressions, cost growth, and useful improvements over time.

Experiment 3: quality-diversity archive utility

Maintain a MAP-Elites-style archive of specialists across task and runtime descriptors. After workload shift, measure whether archive-seeded experiments recover faster than experiments seeded only from the current champion.

Experiment 4: evaluator independence stress test

Allow some candidate generators to propose changes to evaluation thresholds, test suites, or router policies, but require external approval. Measure how often candidates improve genuine performance versus exploiting evaluator weaknesses.

Experiment 5: human capability retention

In a documentation, coding, or analysis workflow, measure team performance with full AI assistance, reduced assistance, and no assistance after several weeks. The goal is to determine whether the system is scaffolding human skill or replacing it.

Experiment template

pseudocode

FUNCTION run_modelbreeder_experiment(hypothesis, environment, policy)
    REGISTER_EXPERIMENT(hypothesis, created_at_utc: NOW_UTC())
    baseline <- FREEZE_BASELINE(environment)
    candidates <- GENERATE_CANDIDATES(policy.allowed_operators)
    evidence <- EVALUATE_ALL(baseline, candidates, policy.suites)
    decisions <- APPLY_VIABILITY_POLICY(evidence, policy)
    REPORT_RESULTS(hypothesis, evidence, decisions, limitations)
    ARCHIVE_REPRODUCIBLE_PACKET()
END FUNCTION

Minimum evidence packet

Every experiment should publish the environment definition, candidate generation rules, evaluation suite version, resource ledger, random seeds where applicable, hardware profile, rejected candidates, promotion rules, and known limitations.

What would falsify the theory

The theory weakens if populations consistently add cost without useful complementarity, if no-op thresholds block nearly all useful innovation, if archives do not help after drift, or if evaluator independence is too expensive to operate. Those outcomes would not be embarrassing. They would clarify where the framework must change.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.

Core synthesisThe Four Fs of AI: Code Breeding, Model Breeding, and the Teleodynamic Convergence of Mutable Small-Model EcologiesConceptual synthesis · 80.5 KB Core synthesisTeleodynamic Evolution of AI EcosystemsConceptual synthesis · 15.3 KB Evolutionary AIThe Architecture of the Perfect Evolutionary Artificial IntelligenceMixed maturity · 58.7 KB Core synthesisThe 4Fs Framework: Fast, Flexible, Frugal, FederatedEmerging practice · 22.5 KB

Goal

Experiment 1: marginal specialist value

Experiment 2: no-op threshold calibration

Experiment 3: quality-diversity archive utility

Experiment 4: evaluator independence stress test

Experiment 5: human capability retention

Experiment template

Minimum evidence packet

What would falsify the theory

Source reports used for this guide

Related guides

Speculation boundary

Theory

Thesis and axioms

Viability mathematics