Evolution Lab Advanced 2 minute read Updated 2026-06-28 UTC

Cooperative caste ecosystems

A lab design for proposer, solver, judge, router, and critic populations that co-improve without letting candidates control their own evaluation.

Research statusMulti-agent pattern adapted from source directive and teleodynamic architecture Publication statePublished Reviewed byMichael Kappel Source reports3

Direct answer

A cooperative caste ecosystem splits a model-breeding lab into role populations. Proposers create tasks, solvers attempt solutions, judges grade evidence, routers allocate work, and critics identify blind spots. The point is not free-form agent debate. The point is structured division of labor with independent evaluation and a shared benefit target.

Role map

CasteFunctionFitness evidence
ProposerGenerates hard but useful tasks and test casesTask novelty, validity, coverage, reuse
SolverProduces candidate answers, patches, summaries, or predictionsAccuracy, utility, calibration, cost
JudgeEvaluates outputs against expected evidenceAgreement with held-out truth and human review
RouterChooses the smallest capable pathCorrect routing, latency, escalation quality
CriticFinds failure modes and missing assumptionsPrevented regressions and improved tests

Cooperative loop

pseudocode
PROCEDURE coevolve_castes(task_stream, populations, frozen_evaluator)
    FOR each batch IN task_stream
        proposed_cases <- populations.proposers.GENERATE(batch.context)
        valid_cases <- frozen_evaluator.FILTER_VALID_CASES(proposed_cases)

        routed_cases <- populations.routers.ASSIGN(valid_cases)
        solutions <- populations.solvers.SOLVE(routed_cases)
        critiques <- populations.critics.REVIEW(solutions)
        grades <- frozen_evaluator.GRADE(valid_cases, solutions, critiques)

        UPDATE_FITNESS(populations.proposers, grades.case_quality)
        UPDATE_FITNESS(populations.routers, grades.routing_quality)
        UPDATE_FITNESS(populations.solvers, grades.solution_quality)
        UPDATE_FITNESS(populations.critics, grades.prevented_failures)

        FOR each caste IN populations
            caste <- BREED_WITHIN_ROLE(caste, policy.role_specific_operator_budget)
            caste <- RETIRE_LOW_UTILITY_MEMBERS(caste)
        END FOR
    END FOR
END PROCEDURE

Design rule

Do not let the judge caste be the sole evaluator for its own descendants. Judges can be bred, but judge candidates are evaluated against frozen test cases, hidden cases, human-labeled samples, and previous judge champions. The selection surface must not be rewritten by the same candidate being selected.

Positive use cases

  • Code repair: proposers produce failing tests, solvers patch, judges check exact outputs.
  • Research synthesis: proposers generate comparison questions, solvers write summaries, critics check source coverage.
  • Document triage: routers assign specialist extractors, judges validate fields.
  • Edge assistants: routers decide local, cloud, or no-op based on sensitivity and latency.

Source reports used for this guide

These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.