Why a benefit benchmark exists
A normal benchmark asks whether the model answered correctly. A benefit benchmark asks whether the system improved the work ecology. Both are needed.
Benchmark families
| Family | Example measure |
|---|---|
| Productivity | Time saved at equal quality. |
| Teaching | Retention after explanation. |
| Frugality | Useful tokens per watt or per MB. |
| Privacy | Sensitive bytes kept local. |
| Reuse | Accepted artifacts per session. |
| Maintainability | Human time to audit and modify. |
FUNCTION run_benefit_suite(candidate, test_pack)
results = {}
results.productivity = productivity_test(candidate, test_pack.workflow)
results.teaching = learning_retention_test(candidate, test_pack.lessons)
results.frugality = energy_memory_latency_test(candidate)
results.privacy = local_data_boundary_test(candidate)
results.reuse = artifact_acceptance_test(candidate)
results.maintainability = maintainer_review_test(candidate)
RETURN results
END FUNCTIONPositive selection rule
Promote candidates that improve at least one benefit family without unacceptable regression in the others.
Source reports used for this guide
These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.