Evaluation Rubric
Define the reusable dimensions to assess on every case. Each dimension has a descriptive name and a pass rationale — a description of what a passing result looks like. These appear preloaded on the feedback panel for one-click Pass/Fail review.
How to write good dimensions & feedback
Your Pass/Fail and the rationale you write are training signal, used two ways:
- An automatic judge reads a dimension's pass rationale as its instruction and grades new cases by it — so write it as a general rule that applies to any case, not a note about one.
- A prompt optimizer reads your failure notes to fix the model — so on a Fail, say what was wrong and what the right answer was. "Wrong" teaches nothing; a diagnosis does.
Three rules:
- General, not case-specific. "Jurisdiction correct" ✓ "Spain was right for case INF_42" ✗
- One concern per dimension, answerable with a clean Pass/Fail. If you can't, split it.
- Pass note = the standard. Fail note = the diagnosis (what was wrong + what correct looks like).
Good Fail note
"Flagged defamation, but the text is opinion, not a false statement of fact — should distinguish protected opinion."
Weak Fail note
"Defamation is wrong."
Current dimensions
Loading…