Theory Advanced 3 minute read Updated 2026-06-26 UTC

Viability mathematics

A normalized scoring model for deciding when to add, merge, compress, promote, or retire model descendants.

Research statusConceptual scoring model Publication statePublished Reviewed byMichael Kappel Source reports3

The score is not just accuracy

A candidate descendant should be evaluated by net viability, not by a single benchmark. Accuracy can rise while system viability falls because the candidate adds too much latency, memory, risk, operational complexity, or evaluator fragility. The viability function exists to stop that failure.

A practical score uses normalized deltas against the current production baseline:

SymbolMeaningDirection
Delta UUtility or task quality improvementHigher is better
Delta RRobustness, calibration, and abstention improvementHigher is better
Delta DUseful behavioral diversity and error decorrelationHigher is better
Delta CTask, language, modality, or environment coverageHigher is better
Delta MMemory, storage, and model-loading overheadLower is better
Delta LEnd-to-end latency and tail latency overheadLower is better
Delta EEnergy, compute, and evaluation costLower is better
Delta SSecurity, safety, legal, and provenance riskLower is better
Delta KMaintenance complexity and coordination burdenLower is better

Normalized viability

The score should be dimensionless. Normalize each dimension to a comparable scale before weighting it. Do not let a benchmark with easy units dominate the decision because it happens to produce larger numbers.

pseudocode
FUNCTION viability(candidate, baseline, weights)
    benefits <- 0
    benefits += weights.utility     * NORMALIZE(candidate.utility - baseline.utility)
    benefits += weights.robustness  * NORMALIZE(candidate.robustness - baseline.robustness)
    benefits += weights.diversity   * NORMALIZE(candidate.diversity_contribution)
    benefits += weights.coverage    * NORMALIZE(candidate.coverage_gain)

    costs <- 0
    costs += weights.memory      * NORMALIZE(candidate.memory_cost - baseline.memory_cost)
    costs += weights.latency     * NORMALIZE(candidate.latency_cost - baseline.latency_cost)
    costs += weights.energy      * NORMALIZE(candidate.energy_cost - baseline.energy_cost)
    costs += weights.risk        * NORMALIZE(candidate.risk_delta)
    costs += weights.complexity  * NORMALIZE(candidate.complexity_delta)

    RETURN benefits - costs
END FUNCTION

Hard gates come before arithmetic

Some properties should not be averaged away. If a candidate fails a license gate, exposes a credential path, lacks a rollback target, violates a safety invariant, or uses unapproved data, the candidate fails even if its numeric score is high.

pseudocode
FUNCTION decision(candidate, baseline, policy)
    IF NOT HARD_GATES_PASS(candidate, policy)
        RETURN REJECT("Hard gate failure")
    END IF

    score <- viability(candidate, baseline, policy.weights)

    IF score >= policy.promote_threshold
        RETURN PROMOTE_WITH_CANARY(candidate, score)
    END IF

    IF score >= policy.archive_threshold
        RETURN ARCHIVE_AS_STEPPING_STONE(candidate, score)
    END IF

    RETURN NO_OP("Insufficient net viability")
END FUNCTION

Thresholds are environment-dependent

A browser-edge deployment should heavily penalize memory and tail latency. A batch research workflow may tolerate slower inference if the result improves coverage or robustness. A regulated workflow should overweight provenance, auditability, and conservative rollback.

Retention score

Viability also applies to existing modules. A module that was once valuable can become a liability after workload shifts, hardware changes, or better descendants arrive.

pseudocode
FUNCTION retention_score(module, observed_window, policy)
    contribution <- MEASURE_MARGINAL_CONTRIBUTION(module, observed_window)
    burden <- MEASURE_RUNNING_BURDEN(module, observed_window)
    risk <- MEASURE_CURRENT_RISK(module)

    RETURN contribution - burden - risk
END FUNCTION

A module with negative retention score is not punished. It is retired, compressed, or moved to a cold archive so the population can remain frugal.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.