Back to Academy
Relevance 8/10Prompting and EvaluationAdvanced7 min read
Code Correctness Evaluation
Code correctness evaluation checks whether generated code satisfies requirements and expected behavior.
Why it matters for annotators
Coding evaluation tasks are often high-value in AI training projects.
Visual mental model
Prompted task -> generated code -> correctness checks.
Examples (bad vs good)
Scenario: Real annotation scenario involving Code Correctness Evaluation
Bad: Labeling quickly without applying project rubric.
Good: Applying rubric criteria, documenting rationale, and escalating uncertainty.
Common mistakes
- Skipping guideline details for edge cases.
- Applying inconsistent criteria across similar samples.
- Avoiding escalation even when uncertain.
Submission checklist
- Read the latest guideline update before each batch.
- Apply rubric dimensions explicitly in each decision.
- Escalate ambiguous items with concise rationale.