Back to Jobs
Other

STEM Coding Expert (Physics, Math, Biology, Chemistry)

Mercor
Pay
$70 / hr
Hourly
Location
Worldwide
Remote
Posted
Mar 3, 2026
Languages
English

Description

1. Role Overview

Mercor is collaborating with a leading AI research partner to develop a next-generation scientific coding benchmark designed to rigorously evaluate frontier models on complex STEM reasoning tasks. We are seeking advanced domain experts to create multi-step, research-derived computational challenges that require executable Python solutions, dependency management, and iterative debugging. These problems will test real-world scientific workflows rather than textbook-style exercises. This is a project-based opportunity focused on producing high-difficulty, high-integrity evaluation data for cutting-edge AI systems.

2. Key Responsibilities

  • Author research-level scientific coding prompts derived from recent peer-reviewed work (post–July 2025)

  • Decompose each problem into 3–5 sequential subproblems reflecting a realistic scientific reasoning chain

  • Develop a clean, well-documented, fully executable Python reference solution with deterministic outputs

  • Calibrate difficulty by evaluating model outputs against the canonical solution and iterating as needed

  • Design a comprehensive unit test suite; ensure the reference solution passes 100% of tests

  • Tag each problem with structured metadata (domain, subdomain, difficulty rating, reviewer notes)

  • Participate in peer review to validate correctness, ambiguity resistance, and robustness against shortcut solutions

  • Document a structured multi-step solution trajectory demonstrating tool use and debugging behavior

3. Ideal Qualifications

  • Advanced expertise in Physics, Chemistry, Mathematics, Biology, or closely related technical domains

  • Strong ability to translate recent research into implementable computational challenges

  • High proficiency in Python for scientific computing (e.g., simulations, numerical methods, data pipelines, symbolic computation)

  • Experience designing reproducible evaluation harnesses and robust unit tests

  • Familiarity with dependency management and sandbox-safe execution environments

  • Exceptional attention to detail and ability to create unambiguous, deterministic specifications

  • Ability to anticipate and mitigate shortcut solutions or memorization risks

Interested in this position?

Apply directly on the company's website