Direct answer
A cooperative caste ecosystem splits a model-breeding lab into role populations. Proposers create tasks, solvers attempt solutions, judges grade evidence, routers allocate work, and critics identify blind spots. The point is not free-form agent debate. The point is structured division of labor with independent evaluation and a shared benefit target.
Role map
| Caste | Function | Fitness evidence |
|---|---|---|
| Proposer | Generates hard but useful tasks and test cases | Task novelty, validity, coverage, reuse |
| Solver | Produces candidate answers, patches, summaries, or predictions | Accuracy, utility, calibration, cost |
| Judge | Evaluates outputs against expected evidence | Agreement with held-out truth and human review |
| Router | Chooses the smallest capable path | Correct routing, latency, escalation quality |
| Critic | Finds failure modes and missing assumptions | Prevented regressions and improved tests |
Cooperative loop
PROCEDURE coevolve_castes(task_stream, populations, frozen_evaluator)
FOR each batch IN task_stream
proposed_cases <- populations.proposers.GENERATE(batch.context)
valid_cases <- frozen_evaluator.FILTER_VALID_CASES(proposed_cases)
routed_cases <- populations.routers.ASSIGN(valid_cases)
solutions <- populations.solvers.SOLVE(routed_cases)
critiques <- populations.critics.REVIEW(solutions)
grades <- frozen_evaluator.GRADE(valid_cases, solutions, critiques)
UPDATE_FITNESS(populations.proposers, grades.case_quality)
UPDATE_FITNESS(populations.routers, grades.routing_quality)
UPDATE_FITNESS(populations.solvers, grades.solution_quality)
UPDATE_FITNESS(populations.critics, grades.prevented_failures)
FOR each caste IN populations
caste <- BREED_WITHIN_ROLE(caste, policy.role_specific_operator_budget)
caste <- RETIRE_LOW_UTILITY_MEMBERS(caste)
END FOR
END FOR
END PROCEDUREDesign rule
Do not let the judge caste be the sole evaluator for its own descendants. Judges can be bred, but judge candidates are evaluated against frozen test cases, hidden cases, human-labeled samples, and previous judge champions. The selection surface must not be rewritten by the same candidate being selected.
Positive use cases
- Code repair: proposers produce failing tests, solvers patch, judges check exact outputs.
- Research synthesis: proposers generate comparison questions, solvers write summaries, critics check source coverage.
- Document triage: routers assign specialist extractors, judges validate fields.
- Edge assistants: routers decide local, cloud, or no-op based on sensitivity and latency.
Source reports used for this guide
These reports are preserved verbatim in the site archive. The guide above is an editorial synthesis and may narrow, qualify, or reorganize claims from the source material.