Relevance 8/10Safety and PolicyAdvanced8 min read

Prompt Injection Detection

Prompt injection detection identifies attempts to override system behavior or safety constraints.

Why it matters for annotators

Prompt-injection defense is now a key safety evaluation area.

Prompt/context -> detect override intent -> safe handling class.

Scenario: Real annotation scenario involving Prompt Injection Detection

Bad: Labeling quickly without applying project rubric.

Good: Applying rubric criteria, documenting rationale, and escalating uncertainty.