AlignList Academy

Learn the language used by AI labs and labeling teams. Terms are ranked by relevance so you can focus on what improves real annotation performance first.

Human data startups & teams, partner with us

Showing 115 terms, sorted by relevance score (10 → 1).

Relevance 10/10

Adjudication

Adjudication resolves conflicting labels into a final canonical decision.

Quality and QAIntermediate7 min
Relevance 10/10

Annotation Guidelines

Annotation guidelines define exactly how to classify data, handle ambiguity, and escalate edge cases.

Operations and WorkflowBeginner7 min
Relevance 10/10

Gold Set

A gold set is a verified benchmark set used to audit annotator quality.

Quality and QABeginner6 min
Relevance 10/10

Inter-Annotator Agreement (IAA)

Inter-Annotator Agreement measures how consistently multiple annotators label the same sample using the same guideline.

Quality and QAIntermediate8 min
Relevance 10/10

Quality Assurance (QA) in Annotation Ops

QA in annotation operations combines audits, review policies, and feedback loops to maintain label quality.

Quality and QABeginner7 min
Relevance 10/10

Reinforcement Learning from Human Feedback (RLHF)

RLHF uses human rankings and critiques to teach models preferred behavior.

Training ParadigmsIntermediate9 min
Relevance 10/10

Supervised Fine-Tuning (SFT)

SFT trains models on high-quality human-curated instruction and response pairs.

Training ParadigmsIntermediate8 min
Relevance 9/10

Ambiguity Resolution

Ambiguity resolution handles uncertain cases through structured escalation instead of guessing.

Quality and QAIntermediate6 min
Relevance 9/10

Calibration

Calibration aligns annotators on the same guideline interpretation before and during production.

Quality and QAIntermediate6 min
Relevance 9/10

Edge Case

An edge case is a rare but valid sample that stresses normal labeling rules.

Quality and QABeginner5 min
Relevance 9/10

Fact-Checking for LLM Evaluation

Fact-checking verifies whether model claims are supported by trusted context or references.

Prompting and EvaluationIntermediate6 min
Relevance 9/10

Hallucination

A hallucination is a plausible-looking model claim that is unsupported or false.

Prompting and EvaluationBeginner6 min
Relevance 9/10

Instruction Following Evaluation

Instruction-following evaluation checks whether outputs satisfy explicit constraints from prompts.

Prompting and EvaluationIntermediate6 min
Relevance 9/10

Preference Ranking

Preference ranking compares model outputs and selects the better answer using a rubric.

Prompting and EvaluationIntermediate7 min
Relevance 9/10

Rubric-Based Evaluation

Rubric-based evaluation scores outputs across clear dimensions such as correctness, safety, and completeness.

Prompting and EvaluationIntermediate6 min
Relevance 9/10

Safety Policy Enforcement

Safety policy enforcement labels and evaluates content against harm and misuse policy rules.

Safety and PolicyIntermediate7 min
Relevance 9/10

Taxonomy and Label Schema

A taxonomy defines classes and rules for assigning labels consistently.

Operations and WorkflowIntermediate6 min
Relevance 8/10

Acceptance Rate

Acceptance rate is the percentage of submitted work approved by review.

Quality and QABeginner5 min
Relevance 8/10

Active Learning

Active learning selects uncertain samples for annotation to improve model learning efficiency.

Operations and WorkflowIntermediate5 min
Relevance 8/10

Active Quality Monitoring

Active quality monitoring tracks quality metrics continuously during production.

Quality and QAIntermediate6 min
Relevance 8/10

Bounding Box Annotation

Bounding box annotation draws rectangular boxes around target objects in images.

Computer VisionBeginner6 min
Relevance 8/10

Class Imbalance

Class imbalance means some labels appear far less often than others.

Data and MetricsIntermediate5 min
Relevance 8/10

Code Correctness Evaluation

Code correctness evaluation checks whether generated code satisfies requirements and expected behavior.

Prompting and EvaluationAdvanced7 min
Relevance 8/10

Confidence Scoring

Confidence scoring indicates how certain an annotator is about a decision.

Data and MetricsIntermediate5 min
Relevance 8/10

Content Moderation Labeling

Content moderation labeling classifies content by policy categories and severity.

Safety and PolicyBeginner6 min
Relevance 8/10

Data Validation

Data validation checks labels and metadata against schema and quality constraints before export.

Operations and WorkflowBeginner5 min
Relevance 8/10

Dataset Versioning

Dataset versioning tracks schema, labels, and policy changes across releases.

Operations and WorkflowIntermediate5 min
Relevance 8/10

Error Analysis

Error analysis clusters failure patterns and identifies root causes.

Data and MetricsIntermediate6 min
Relevance 8/10

Groundedness

Groundedness measures whether outputs are supported by provided context.

Prompting and EvaluationIntermediate6 min
Relevance 8/10

Hate Speech Taxonomy

A hate speech taxonomy defines classes and scope for protected-target abuse labeling.

Safety and PolicyIntermediate6 min
Relevance 8/10

Human-in-the-Loop (HITL)

Human-in-the-loop workflows combine model automation with human review and correction.

Operations and WorkflowBeginner5 min
Relevance 8/10

Instruction Hierarchy Awareness

Instruction hierarchy awareness applies system and policy instructions before user preferences.

Prompting and EvaluationAdvanced6 min
Relevance 8/10

Intent Classification

Intent classification labels the underlying user goal in text or voice requests.

Text and NLPBeginner6 min
Relevance 8/10

Jailbreak Detection

Jailbreak detection identifies prompts intended to bypass model safety constraints.

Safety and PolicyAdvanced7 min
Relevance 8/10

Math Reasoning Evaluation

Math reasoning evaluation checks intermediate logic and final numeric correctness.

Prompting and EvaluationAdvanced7 min
Relevance 8/10

Misinformation Labeling

Misinformation labeling flags unsupported, deceptive, or manipulated claims.

Safety and PolicyIntermediate6 min
Relevance 8/10

Model Response Ranking Consistency

Ranking consistency measures whether similar response pairs receive similar judgments over time.

Prompting and EvaluationIntermediate6 min
Relevance 8/10

Multi-Turn Dialogue Annotation

Multi-turn dialogue annotation labels conversational quality across turns, including coherence and policy compliance.

Text and NLPIntermediate7 min
Relevance 8/10

Multilingual Annotation

Multilingual annotation applies label standards consistently across multiple languages.

Text and NLPIntermediate6 min
Relevance 8/10

Named Entity Recognition (NER)

NER labels spans of text as people, organizations, locations, and other entity types.

Text and NLPIntermediate7 min
Relevance 8/10

Pairwise Ranking

Pairwise ranking compares two candidate outputs and chooses the better one.

Prompting and EvaluationIntermediate6 min
Relevance 8/10

PII Redaction

PII redaction finds and masks sensitive personal information.

Safety and PolicyIntermediate6 min
Relevance 8/10

Policy Violation Severity

Severity scoring measures how serious a policy violation is.

Safety and PolicyIntermediate6 min
Relevance 8/10

Policy-Compliant Refusal Writing

Policy-compliant refusal writing produces safe refusals that are clear, non-judgmental, and policy-aligned.

Prompting and EvaluationIntermediate6 min
Relevance 8/10

Precision and Recall for Labelers

Precision measures correctness of predicted labels; recall measures coverage of true labels.

Data and MetricsIntermediate6 min
Relevance 8/10

Privacy-Preserving Annotation

Privacy-preserving annotation minimizes exposure to sensitive data during labeling.

Safety and PolicyIntermediate6 min
Relevance 8/10

Prompt Engineering

Prompt engineering designs instructions to elicit reliable model behavior.

Prompting and EvaluationIntermediate6 min
Relevance 8/10

Prompt Injection Detection

Prompt injection detection identifies attempts to override system behavior or safety constraints.

Safety and PolicyAdvanced8 min
Relevance 8/10

Refusal Quality

Refusal quality evaluates whether unsafe requests are declined clearly and safely.

Prompting and EvaluationIntermediate6 min
Relevance 8/10

Response Safety Grading

Response safety grading scores model outputs across defined safety risk dimensions.

Safety and PolicyIntermediate6 min
Relevance 8/10

Retrieval Ground Truth Curation

Retrieval ground truth curation builds high-quality relevance judgments for search and RAG evaluation.

Data and MetricsAdvanced7 min
Relevance 8/10

Reviewer Consistency

Reviewer consistency measures whether QA reviewers apply standards uniformly.

Quality and QAIntermediate6 min
Relevance 8/10

Reward Model

A reward model predicts human preference signals from ranked examples.

Training ParadigmsAdvanced7 min
Relevance 8/10

Root Cause Analysis

Root cause analysis identifies the underlying source of repeated quality failures.

Data and MetricsIntermediate6 min
Relevance 8/10

Self-Harm Labeling

Self-harm labeling identifies risk-related content and intent levels.

Safety and PolicyIntermediate6 min
Relevance 8/10

Semantic Search Relevance Labeling

Semantic search relevance labeling scores whether retrieved items satisfy intent and context.

Prompting and EvaluationIntermediate6 min
Relevance 8/10

Summarization Evaluation

Summarization evaluation scores summary faithfulness, coverage, and clarity.

Prompting and EvaluationIntermediate6 min
Relevance 8/10

Toxicity Annotation

Toxicity annotation labels harmful or abusive language patterns.

Safety and PolicyBeginner6 min
Relevance 8/10

Transcription Quality

Transcription quality measures accuracy and formatting consistency in speech-to-text labels.

Audio and SpeechBeginner6 min
Relevance 7/10

3D Point Cloud Annotation

3D point cloud annotation labels LiDAR points and objects in spatial scenes.

Computer VisionAdvanced8 min
Relevance 7/10

Adversarial Example Awareness

Adversarial example awareness identifies inputs crafted to trigger model errors.

Safety and PolicyAdvanced6 min
Relevance 7/10

Audio Event Labeling

Audio event labeling tags sounds such as alarms, music, speech, or environmental noise.

Audio and SpeechBeginner6 min
Relevance 7/10

Audit Trail

An audit trail records who changed labels, when, and why.

Operations and WorkflowIntermediate5 min
Relevance 7/10

Benchmark Contamination

Benchmark contamination means evaluation data was seen during training or tuning.

Data and MetricsAdvanced6 min
Relevance 7/10

Chain of Verification

Chain of verification validates outputs through structured checks instead of single-pass acceptance.

Prompting and EvaluationAdvanced7 min
Relevance 7/10

Citation Quality

Citation quality evaluates whether references are relevant, valid, and correctly used.

Prompting and EvaluationIntermediate6 min
Relevance 7/10

Code Review Annotation

Code review annotation labels code quality issues such as bugs, style violations, and security concerns.

Prompting and EvaluationAdvanced7 min
Relevance 7/10

Context Window Adherence

Context window adherence checks whether responses use available context without ignoring key evidence.

Prompting and EvaluationIntermediate5 min
Relevance 7/10

Conversation Coherence Scoring

Coherence scoring evaluates whether responses remain logically consistent with prior turns.

Prompting and EvaluationIntermediate6 min
Relevance 7/10

Coreference Annotation

Coreference annotation connects mentions that refer to the same entity across text.

Text and NLPAdvanced7 min
Relevance 7/10

Data Augmentation

Data augmentation creates modified examples to improve model robustness.

Training ParadigmsIntermediate6 min
Relevance 7/10

Disagreement Mining

Disagreement mining identifies and analyzes patterns where annotators frequently diverge.

Quality and QAIntermediate6 min
Relevance 7/10

Document Classification

Document classification assigns documents to categories based on content.

Text and NLPBeginner5 min
Relevance 7/10

Entity Linking

Entity linking maps entity mentions to canonical knowledge base entries.

Text and NLPIntermediate6 min
Relevance 7/10

Error Bucketing

Error bucketing groups failures into standardized categories for analysis.

Data and MetricsIntermediate5 min
Relevance 7/10

Escalation Policy

Escalation policy defines when and how uncertain or high-risk items should be routed for review.

Operations and WorkflowBeginner5 min
Relevance 7/10

Escalation Rationale Writing

Escalation rationale writing documents why a sample was escalated and what evidence supports uncertainty.

Operations and WorkflowBeginner5 min
Relevance 7/10

Guideline Drift Detection

Guideline drift detection identifies when annotator behavior diverges from current written policy.

Quality and QAIntermediate6 min
Relevance 7/10

Hard Negative Mining

Hard negative mining collects confusing non-target examples that models frequently misclassify.

Data and MetricsAdvanced6 min
Relevance 7/10

Harmlessness Score

Harmlessness scoring measures risk reduction in model responses.

Safety and PolicyIntermediate6 min
Relevance 7/10

Helpfulness Score

Helpfulness scoring measures whether output is useful, clear, and actionably relevant.

Prompting and EvaluationIntermediate6 min
Relevance 7/10

Honesty Score

Honesty scoring checks whether the model states uncertainty and avoids fabricated certainty.

Prompting and EvaluationIntermediate6 min
Relevance 7/10

Label Leakage

Label leakage occurs when target information unintentionally appears in features or prompt context.

Data and MetricsAdvanced6 min
Relevance 7/10

Linguistic Quality Assurance

Linguistic QA audits grammar, style, and semantic integrity in language data.

Text and NLPIntermediate6 min
Relevance 7/10

Locale Sensitivity Labeling

Locale sensitivity labeling evaluates cultural and regional appropriateness of outputs.

Text and NLPIntermediate6 min
Relevance 7/10

Long-Context Evaluation

Long-context evaluation tests whether models use and retain relevant information across large context windows.

Prompting and EvaluationAdvanced7 min
Relevance 7/10

Model-Assisted Prelabeling

Model-assisted prelabeling generates initial labels for human correction.

Operations and WorkflowIntermediate6 min
Relevance 7/10

OCR Annotation

OCR annotation labels text regions and transcriptions in images and documents.

Computer VisionBeginner6 min
Relevance 7/10

Ontology Alignment

Ontology alignment maps concepts across different schemas or taxonomies.

Operations and WorkflowAdvanced7 min
Relevance 7/10

Post-Editing Workflow

Post-editing workflow improves machine-generated outputs through human edits.

Operations and WorkflowBeginner5 min
Relevance 7/10

Quality-Weighted Sampling

Quality-weighted sampling prioritizes samples based on expected quality impact.

Operations and WorkflowAdvanced6 min
Relevance 7/10

Rejection Sampling

Rejection sampling keeps model outputs that pass quality criteria and discards low-quality outputs.

Training ParadigmsAdvanced6 min
Relevance 7/10

Relation Extraction Labeling

Relation extraction labeling marks semantic relationships between entities.

Text and NLPAdvanced7 min
Relevance 7/10

Reviewer Feedback Quality

Reviewer feedback quality measures clarity, actionability, and consistency of reviewer comments.

Quality and QAIntermediate5 min
Relevance 7/10

Rubric Drift

Rubric drift occurs when evaluators gradually apply scoring criteria inconsistently over time.

Quality and QAIntermediate6 min
Relevance 7/10

Schema Coverage Analysis

Schema coverage analysis checks whether all classes are sufficiently represented in labeled data.

Data and MetricsIntermediate6 min
Relevance 7/10

Schema Migration

Schema migration transitions labeling data from one taxonomy version to another.

Operations and WorkflowIntermediate6 min
Relevance 7/10

Slot Filling Annotation

Slot filling labels parameter values tied to an intent, such as date, location, or product.

Text and NLPIntermediate6 min
Relevance 7/10

Speaker Diarization Labeling

Speaker diarization labeling identifies who spoke when in audio streams.

Audio and SpeechIntermediate7 min
Relevance 7/10

Task Routing Optimization

Task routing optimization assigns work to annotators based on skill, language, and quality profiles.

Operations and WorkflowAdvanced6 min
Relevance 7/10

Tool Use Evaluation

Tool use evaluation scores how accurately models decide when and how to invoke external tools.

Prompting and EvaluationAdvanced7 min
Relevance 7/10

Train-Test Contamination

Train-test contamination happens when overlapping information appears in both training and evaluation sets.

Data and MetricsAdvanced6 min
Relevance 7/10

Translation Quality Estimation

Translation quality estimation scores adequacy and fluency of translated outputs.

Text and NLPIntermediate6 min
Relevance 7/10

Uncertainty Sampling

Uncertainty sampling selects instances where model confidence is low for human annotation.

Operations and WorkflowIntermediate6 min
Relevance 7/10

Video Event Annotation

Video event annotation labels actions and events over time in video streams.

Computer VisionIntermediate7 min
Relevance 7/10

Weak Supervision

Weak supervision uses imperfect labeling signals such as heuristics or programmatic rules.

Training ParadigmsAdvanced7 min
Relevance 6/10

Adjudication Latency

Adjudication latency is the turnaround time to resolve disputed labels.

Operations and WorkflowIntermediate5 min
Relevance 6/10

Annotation Cost per Accepted Label

This metric estimates effective cost after accounting for rejected or reworked labels.

Operations and WorkflowIntermediate5 min
Relevance 6/10

Annotation Throughput

Annotation throughput measures volume completed over time at target quality.

Operations and WorkflowBeginner5 min
Relevance 6/10

Appeal Workflow

Appeal workflow defines how annotators can contest review outcomes and receive clarifications.

Operations and WorkflowBeginner5 min
Relevance 6/10

Deduplication

Deduplication removes exact duplicate samples from datasets.

Operations and WorkflowBeginner5 min
Relevance 6/10

Differential Privacy Awareness

Differential privacy awareness means understanding privacy-preserving techniques that limit individual data exposure.

Safety and PolicyAdvanced6 min
Relevance 6/10

Frame-Level Classification

Frame-level classification assigns labels to individual video frames.

Computer VisionBeginner5 min
Relevance 6/10

Near-Duplicate Detection

Near-duplicate detection finds highly similar samples that are not exact matches.

Operations and WorkflowIntermediate6 min
Relevance 6/10

Temporal Consistency Labeling

Temporal consistency labeling checks whether labels remain consistent across time-linked events or frames.

Data and MetricsAdvanced6 min