Moderation Engine

Scoring matrix

Score	Definition
0
1
2
3
4
5

Evaluator panel

Number of evaluators

2

3

4

Evaluator mode

Specialised profiles

Neutral / rational

CL

The Analyst

claude-opus · Anthropic

GP

The Pragmatist

gpt-4o · OpenAI

GM

The Strategist

gemini-2.5-flash · Google

MI

The Challenger

mistral-large · Mistral AI

1 — Question › 2 — Specification › 3 — Response quality

Tender question

Subject matter category

Question category

Question text max 200 words · overarching question + 1–3 sub-questions + quality indicators

⚠ Generating content calls the API and may incur a charge. You can also write your own question.

0 / 200 words

↑ Complete the question above to unlock this section.

Specification extract

Specification text max 200 words · short extract relevant to the question above

⚠ Generation uses your question above to produce a relevant and coherent specification extract.

0 / 200 words

↑ Complete the specification above to unlock this section.

Response quality profile

✦

Mostly strong

Good response with a couple of minor weaknesses seeded in

◈

Mostly weak

Poor response with a couple of redeeming qualities

⟳

Random

Quality profile chosen randomly on each run

Deliberation method

Baseline

Modified Consensus

Open facilitated discussion to consensus. Closest to current practice. Evaluator identities visible throughout.

Iterative

Delphi

Anonymous iterative rounds. Evaluators see aggregated score summary only — no direct debate.

Structured

Nominal Group (NGT)

Individual score → simultaneous share → meta-discussion on evaluation consistency → independent rescore.

Adversarial

Structured Argumentative

Rotating devil's advocate. Each evaluator shares; one peer challenges their reasoning. Final independent rescore.

Run parameters

Max rounds

5

Number of runs

1

Round limit visibility

Blind — evaluators unaware of limit

Live status

No experiment running. Configure settings above and press Begin.

⚠ Running an experiment makes multiple API calls across up to 4 models. Total cost scales with evaluator count × round count × number of runs.

Experiment summary

No runs completed yet.

Current run log

No run in progress.

Configuration

Tender Setup

Prepare Experiment

Results