Description
Role Overview
We are seeking expert physics researchers to author and verify golden reference solutions for the CritPt benchmark (arXiv:2509.26574v3) — a frontier research-level physics benchmark. Participants will solve CritPt research-level problems end-to-end, audit solutions from other experts, or adjudicate between parallel solution attempts, producing 100%-human-verified reference data used to evaluate large language models on frontier physics reasoning.
Physics Subdomains Covered
High Energy Physics & Mathematical Physics, Biophysics & Statistical Physics, Condensed Matter & AMO, Gravitation / Cosmology / Astrophysics, Quantum Information, Optical Properties of Materials, Magnetic Materials, Measurements in QM.
Key Responsibilities
-
Solve research-level physics challenges end-to-end with verifiable derivations, code, and peer-reviewed references
-
Decompose challenges into standalone checkpoint sub-problems that require genuine physical reasoning
-
Author Python answer templates with auto-grading functions for symbolic or numerical answers
-
Audit submitted solutions for correctness, scope, and method soundness; deliver actionable feedback across iterations
-
Adjudicate between parallel solver attempts and decide which solution becomes the golden reference
-
Document chain-of-thought reasoning, error tolerances, equivalent symbolic forms, and verification test cases
Ideal Qualifications
-
Solver: PhD or postdoc in the relevant subfield (senior PhD student minimum)
-
Auditor: Postdoc or junior professor in the relevant subfield (PhD minimum)
-
Adjudicator: Full professor or industry research PI in the relevant subfield (senior postdoc or junior professor minimum)
-
Hands-on familiarity with at least two canonical methods of the target subfield, demonstrable through publications (broader coverage strongly preferred)
-
3–5 representative publications (arXiv ID or DOI), ideally within the last ~5 years and in the target subfield
-
Working proficiency with LaTeX, Python, Jupyter, and SymPy
-
Strong written English (B2/C1/C2 minimum; native or near-native preferred)
More About the Opportunity
-
Expected commitment: ~10 hours/week, sustained across an 8–10 week window per task pool
-
Pay range: $80–$140 per hour, based on role and demonstrated expertise
-
Asynchronous work
Interested in this position?
Apply directly on the company's website