CUDA Engineering Expert
MercorDescription
1. Role Overview
Mercor is seeking GPU kernel optimization experts to contribute to a project with a leading AI lab. This opportunity is designed for freelancers with strong C++ skills, practical GPU programming experience, and the ability to improve kernel performance using profiler-guided analysis. You’ll help evaluate, optimize, and reason about GPU kernels across modern hardware environments. This is a contract-based opportunity for specialists who enjoy squeezing performance out of modern GPU architectures.
2. Key Responsibilities
-
Analyze and optimize GPU kernels for performance, efficiency, and hardware utilization
-
Use profiler metrics such as L2 cache hit rate, L2 throughput, occupancy, and related signals to guide kernel improvements
-
Review GPU kernel implementations and identify bottlenecks without requiring extensive background in the underlying algorithms
-
Write, modify, and reason about C++17, Python, and GPU programming code
-
Apply CUDA, HIP, shader programming, or related kernel programming expertise to improve performance outcomes
-
Document optimization decisions clearly, including when specific profiler metrics are or are not useful
3. Ideal Qualifications
-
Available to work at least 20 hrs/wk
-
Fluent in core C++ features through C++17
-
Working knowledge of Python and Git
-
Fluent in at least one GPU programming model, such as CUDA, HIP, Slang, HLSL, GLSL, or related kernel programming
-
At least 1 year of professional or graduate-level research experience working with GPUs
-
Strong understanding of GPU profiler performance metrics and how to use them to optimize kernels
-
Ability to optimize GPU kernels without needing deep prior context on every algorithm
-
Experience with CUDA, HIP, CUDA C++ Core Libraries, inline PTX assembly, or tensor core-level optimization is a plus
-
Experience optimizing kernels for NVIDIA Blackwell hardware is a plus
-
Familiarity with NSight Compute is a plus
-
Prior experience with GPU hardware organizations such as NVIDIA, AMD, or Qualcomm is a plus
-
Open-source contributions related to GPU kernel optimization are a plus
4. Application Process
-
Submit your resume or relevant technical background to get started
-
Qualified applicants may be asked to complete a brief technical assessment or submit additional information
Interested in this position?
Apply directly on the company's website