Description
1. Role Overview
Mercor is collaborating with a leading AI research partner to develop a next-generation scientific coding benchmark designed to rigorously evaluate frontier models on complex STEM reasoning tasks. We are seeking advanced domain experts to create multi-step, research-derived computational challenges that require executable Python solutions, dependency management, and iterative debugging. These problems will test real-world scientific workflows rather than textbook-style exercises. This is a project-based opportunity focused on producing high-difficulty, high-integrity evaluation data for cutting-edge AI systems.
2. Key Responsibilities
-
Author research-level scientific coding prompts derived from recent peer-reviewed work (post–July 2025)
-
Decompose each problem into 3–5 sequential subproblems reflecting a realistic scientific reasoning chain
-
Develop a clean, well-documented, fully executable Python reference solution with deterministic outputs
-
Calibrate difficulty by evaluating model outputs against the canonical solution and iterating as needed
-
Design a comprehensive unit test suite; ensure the reference solution passes 100% of tests
-
Tag each problem with structured metadata (domain, subdomain, difficulty rating, reviewer notes)
-
Participate in peer review to validate correctness, ambiguity resistance, and robustness against shortcut solutions
-
Document a structured multi-step solution trajectory demonstrating tool use and debugging behavior
3. Ideal Qualifications
-
Advanced expertise in Physics, Chemistry, Mathematics, Biology, or closely related technical domains
-
Strong ability to translate recent research into implementable computational challenges
-
High proficiency in Python for scientific computing (e.g., simulations, numerical methods, data pipelines, symbolic computation)
-
Experience designing reproducible evaluation harnesses and robust unit tests
-
Familiarity with dependency management and sandbox-safe execution environments
-
Exceptional attention to detail and ability to create unambiguous, deterministic specifications
-
Ability to anticipate and mitigate shortcut solutions or memorization risks
Interested in this position?
Apply directly on the company's website