Description
PLEASE NOTE: You must take the Bilingual Competency interview in Urdu to be considered for this role.
Location: Global Type: Contract Work Fluent Language Skills Required: Urdu (native fluency) and English (strong proficiency)
Why this role matters: Your job is to assess Urdu AI-generated responses and identify specific strengths and areas of improvement for these responses – your work will be used to create the "perfect AI-generated response" at a later stage of this project. Note the analysis you create will be in English.
What You'll Do
-
Conduct fact-checking using trusted public sources and external tools
-
Generate high-quality human evaluation data by identifying response strengths, areas for improvement, and factual inaccuracies
-
Assess reasoning quality, clarity, tone, and completeness of responses
-
Ensure model responses align with expected conversational behavior and system guidelines
Who You Are
-
You hold a Bachelor's degree
-
You are a native speaker in Urdu
-
You have significant experience using large language models (LLMs) and understand how and why people use them
-
You have excellent writing skills in English and can clearly articulate nuanced feedback
-
You have strong attention to detail and consistently notice subtle issues others may overlook
-
You have a background or experience in domains requiring structured analytical thinking (e.g., research, policy, analytics, linguistics, engineering)
Nice-to-Have Specialties
-
Prior experience with RLHF, model evaluation, or data annotation work
-
Experience writing or editing high-quality written content
-
Experience comparing multiple outputs and making fine-grained qualitative judgments
What Success Looks Like
-
You identify factual inaccuracies, reasoning errors, and communication gaps in model responses
-
You produce clear, consistent, and reproducible evaluation artifacts
-
Your feedback leads to measurable improvements in response quality and user experience
Interested in this position?
Apply directly on the company's website