Evolution lab Advanced 1 minute read Updated 2026-06-26 UTC

Benefit benchmark suite

A proposed benchmark suite for productivity, teaching, frugality, local privacy, reuse, and maintainability.

Research statusExperiment pattern Publication statePublished Reviewed byMichael Kappel Source reports3

Why a benefit benchmark exists

A normal benchmark asks whether the model answered correctly. A benefit benchmark asks whether the system improved the work ecology. Both are needed.

Benchmark families

FamilyExample measure
ProductivityTime saved at equal quality.
TeachingRetention after explanation.
FrugalityUseful tokens per watt or per MB.
PrivacySensitive bytes kept local.
ReuseAccepted artifacts per session.
MaintainabilityHuman time to audit and modify.
pseudocode
FUNCTION run_benefit_suite(candidate, test_pack)
    results = {}
    results.productivity = productivity_test(candidate, test_pack.workflow)
    results.teaching = learning_retention_test(candidate, test_pack.lessons)
    results.frugality = energy_memory_latency_test(candidate)
    results.privacy = local_data_boundary_test(candidate)
    results.reuse = artifact_acceptance_test(candidate)
    results.maintainability = maintainer_review_test(candidate)
    RETURN results
END FUNCTION

Positive selection rule

Promote candidates that improve at least one benefit family without unacceptable regression in the others.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.