1 — Question
›
2 — Specification
›
3 — Response quality
↑ Complete the question above to unlock this section.
↑ Complete the specification above to unlock this section.
Response quality profile
✦
Mostly strong
Good response with a couple of minor weaknesses seeded in
◈
Mostly weak
Poor response with a couple of redeeming qualities
⟳
Random
Quality profile chosen randomly on each run
Deliberation method
Baseline
Modified Consensus
Open facilitated discussion to consensus. Closest to current practice. Evaluator identities visible throughout.
Iterative
Delphi
Anonymous iterative rounds. Evaluators see aggregated score summary only — no direct debate.
Structured
Nominal Group (NGT)
Individual score → simultaneous share → meta-discussion on evaluation consistency → independent rescore.
Adversarial
Structured Argumentative
Rotating devil's advocate. Each evaluator shares; one peer challenges their reasoning. Final independent rescore.
Live status
No experiment running. Configure settings above and press Begin.
⚠
Running an experiment makes multiple API calls across up to 4 models. Total cost scales with evaluator count × round count × number of runs.
◎
No results yet
Run an experiment on the Prepare Run page to see results here.
Generated tender response