Count lifecycle cost
Inference cost is only one part of viability. Each model adds training, evaluation, storage, loading, monitoring, incident, compliance, and operator costs. Population growth can make a system uneconomic even when every descendant is individually efficient.
Cost dimensions
- training or adaptation compute;
- evaluation compute and human review;
- accelerator and memory residency;
- cold-start and cache churn;
- storage for artifacts, data, and evidence;
- network transfer and federated communication;
- observability and retention;
- security review and penetration testing;
- on-call and maintenance burden;
- vendor, license, and jurisdiction costs;
- rollback reserve and disaster recovery.
Resource ledger
ledger <- {
gpu_hours_available: 400,
inference_cost_budget_monthly: 12000,
active_model_memory_gb: 80,
archive_storage_tb: 5,
evaluation_human_hours: 120,
risk_capacity: 20,
minimum_operational_reserve_percent: 25
}Reserve before launching an experiment. Charge actual usage back to operator families and capabilities. A controller should not spend the rollback reserve to create more candidates.
Accepted-result cost
Measure cost per accepted result, not per model invocation. Include retries, judge calls, fallbacks, human corrections, and failed routes. A fast cheap specialist with a high escalation rate may be expensive overall.
Capacity forecasting
Forecast by request volume, route mix, model residency, candidate-generation schedule, evaluation queue, and release concurrency. Include peak and failure conditions, such as cloud fallback after an edge outage.
Population budget
Set a fixed or slowly changing budget for active models. New specialists must replace another artifact, reduce other cost, or consume explicit reserve. This creates pressure to merge or retire redundant components.
Cost-aware selection
net_value <- business_value_of_quality_gain
- incremental_inference_cost
- lifecycle_maintenance_cost
- expected_incident_cost
- opportunity_cost_of_reserved_capacityDo not convert every risk into money. Hard safety, legal, and data constraints remain non-negotiable.
Source reports used for this guide
These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.