Location
San Carlos, United States
San Carlos, United States
Salary
(Yearly)
(Yearly)
(Yearly)
(Yearly)
(Hourly)
Undisclosed
Date posted
May 1, 2026
Job type
Full-time
Experience level
Mid Level 8+
Summary this job with AI
Highlight
Highlight

Job Description

What You’ll Do

  • Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernels

  • Design, build, and optimize distributed training systems (PyTorch) for multi-node GPU clusters, ensuring scalability, robustness, and high utilization

  • Implement efficient low-level code (CUDA, cuDNN, Triton, custom kernels) and integrate it seamlessly into high-level training frameworks

  • Optimize workloads for hardware efficiency: CPU/GPU compute balance, memory management, data throughput, and networking

  • Develop monitoring and debugging tools for large-scale runs, enabling rapid diagnosis of performance regressions and failures

What You’ll Bring

  • Deep experience in distributed systems, ML infrastructure, or high-performance computing (8+ years)

  • Production-grade expertise in Python

  • Low-level performance mastery: CUDA/cuDNN/Triton, CPU–GPU interactions, data movement, and kernel optimization

  • Scaling at the frontier: experience with PyTorch and training jobs using data, context, pipeline, and model parallelism

  • System-level mindset with a track record of tuning hardware–software interactions for maximum utilization

Apply now
Genesis AI is hiring a Member of Technical Staff, Training (Bay Area, Remote). Apply through The AI Chopping Block and and make the next move in your career!
Apply now
Companies size
11-50
employees
Founded in
Headquaters
Country
Industry
Research
Social media
Visit website

Similar AI jobs

Here are other jobs you might want to apply for.

US.svg
United States

ML/AI Engineer - Vehicle Intelligence

Full-time
Machine Learning Engineer
No items found.

Member of Technical Staff (Machine Learning Engineer)

Full-time
Machine Learning Engineer
US.svg
United States

Technical Lead Manager - Training Runtime, Data(set) Movement

Full-time
Machine Learning Engineer
US.svg
United States

Senior Product Engineer, Growth & Lifecycle Infrastructure - Music & Audio

Full-time
Machine Learning Engineer
KR.svg
South Korea

Senior Deep Learning Engineer (음성 합성 개발)

Full-time
Machine Learning Engineer
US.svg
United States

Head of ML

Full-time
Machine Learning Engineer
Open Modal