Viability mathematics — ModelBreeder.com

The score is not just accuracy

A candidate descendant should be evaluated by net viability, not by a single benchmark. Accuracy can rise while system viability falls because the candidate adds too much latency, memory, risk, operational complexity, or evaluator fragility. The viability function exists to stop that failure.

A practical score uses normalized deltas against the current production baseline:

Symbol	Meaning	Direction
Delta U	Utility or task quality improvement	Higher is better
Delta R	Robustness, calibration, and abstention improvement	Higher is better
Delta D	Useful behavioral diversity and error decorrelation	Higher is better
Delta C	Task, language, modality, or environment coverage	Higher is better
Delta M	Memory, storage, and model-loading overhead	Lower is better
Delta L	End-to-end latency and tail latency overhead	Lower is better
Delta E	Energy, compute, and evaluation cost	Lower is better
Delta S	Security, safety, legal, and provenance risk	Lower is better
Delta K	Maintenance complexity and coordination burden	Lower is better

Normalized viability

The score should be dimensionless. Normalize each dimension to a comparable scale before weighting it. Do not let a benchmark with easy units dominate the decision because it happens to produce larger numbers.

pseudocode

FUNCTION viability(candidate, baseline, weights)
    benefits <- 0
    benefits += weights.utility     * NORMALIZE(candidate.utility - baseline.utility)
    benefits += weights.robustness  * NORMALIZE(candidate.robustness - baseline.robustness)
    benefits += weights.diversity   * NORMALIZE(candidate.diversity_contribution)
    benefits += weights.coverage    * NORMALIZE(candidate.coverage_gain)

    costs <- 0
    costs += weights.memory      * NORMALIZE(candidate.memory_cost - baseline.memory_cost)
    costs += weights.latency     * NORMALIZE(candidate.latency_cost - baseline.latency_cost)
    costs += weights.energy      * NORMALIZE(candidate.energy_cost - baseline.energy_cost)
    costs += weights.risk        * NORMALIZE(candidate.risk_delta)
    costs += weights.complexity  * NORMALIZE(candidate.complexity_delta)

    RETURN benefits - costs
END FUNCTION

Hard gates come before arithmetic

Some properties should not be averaged away. If a candidate fails a license gate, exposes a credential path, lacks a rollback target, violates a safety invariant, or uses unapproved data, the candidate fails even if its numeric score is high.

pseudocode

FUNCTION decision(candidate, baseline, policy)
    IF NOT HARD_GATES_PASS(candidate, policy)
        RETURN REJECT("Hard gate failure")
    END IF

    score <- viability(candidate, baseline, policy.weights)

    IF score >= policy.promote_threshold
        RETURN PROMOTE_WITH_CANARY(candidate, score)
    END IF

    IF score >= policy.archive_threshold
        RETURN ARCHIVE_AS_STEPPING_STONE(candidate, score)
    END IF

    RETURN NO_OP("Insufficient net viability")
END FUNCTION

Thresholds are environment-dependent

A browser-edge deployment should heavily penalize memory and tail latency. A batch research workflow may tolerate slower inference if the result improves coverage or robustness. A regulated workflow should overweight provenance, auditability, and conservative rollback.

Retention score

Viability also applies to existing modules. A module that was once valuable can become a liability after workload shifts, hardware changes, or better descendants arrive.

pseudocode

FUNCTION retention_score(module, observed_window, policy)
    contribution <- MEASURE_MARGINAL_CONTRIBUTION(module, observed_window)
    burden <- MEASURE_RUNNING_BURDEN(module, observed_window)
    risk <- MEASURE_CURRENT_RISK(module)

    RETURN contribution - burden - risk
END FUNCTION

A module with negative retention score is not punished. It is retired, compressed, or moved to a cold archive so the population can remain frugal.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.

Core synthesisThe Four Fs of AI: Code Breeding, Model Breeding, and the Teleodynamic Convergence of Mutable Small-Model EcologiesConceptual synthesis · 80.5 KB Core synthesisTeleodynamic Evolution of AI EcosystemsConceptual synthesis · 15.3 KB Core synthesisThe Architecture of Adaptability: An Exhaustive Analysis of the 4Fs, Code Beading, Model Breeding, and Interchangeable SystemsMixed maturity · 49.7 KB

The score is not just accuracy

Normalized viability

Hard gates come before arithmetic

Thresholds are environment-dependent

Retention score

Source reports used for this guide

Related guides

Benefit-centered viability

Theory

Metastable convergence

Thesis and axioms