Senior Machine Learning Engineer, Voice AI
Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference; implement and maintain changes in high-performance inference engines including kernel backends, speculative decoding, and quantization; profile and optimize performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Unify inference with RL/post-training by designing and operating RL and post-training pipelines, optimizing algorithms and systems where most costs are inference, and making RL and post-training workloads more efficient with inference-aware training loops. Use these pipelines to train, evaluate, and iterate on frontier models on the inference stack. Co-design algorithms and infrastructure to tightly couple objectives, rollout collection, and evaluation to efficient inference, identifying bottlenecks in the training engine, inference engine, data pipeline, and user-facing layers. Run ablations and scale-up experiments to understand trade-offs between model quality, latency, throughput, and cost, feeding insights back into model, RL, and system design. Own critical systems at production scale by profiling, debugging, and optimizing inference and post-training services under real production workloads and driving roadmap items requiring engine modifications including kernels, memory layouts, and scheduling logic. Establish metrics, benchmarks, and experimentation frameworks to rigorously validate improvements. Provide technical leadership by setting technical direction for cross-team efforts at the intersection of inference, RL, and post-training, and mentoring engineers and researchers in full-stack ML systems work and performance engineering.
Senior Software Engineer, ML Infrastructure
Design and build distributed training platforms for large language models and multimodal fine-tuning and post-training at scale. Integrate state-of-the-art training algorithms into production pipelines. Own inference architecture and multi-provider routing, including failover and optimization. Lead initiatives to improve latency and cost efficiency across the training and serving stack. Build evaluation and experimentation infrastructure that enables rapid and reliable iteration. Drive technical direction, mentor engineers, and establish best practices for ML infrastructure.
Member of Technical Staff - Mid-Training Infra
Design, build, and operate large-scale GPU infrastructure for high-throughput model inference and mid-training workloads. Develop systems that power synthetic data generation and reinforcement learning pipelines at scale. Build high-performance inference platforms capable of serving and evaluating models across thousands of GPUs. Optimize throughput, latency, and GPU utilization for large language model inference and rollout workloads. Build infrastructure that supports reinforcement learning pipelines, including large-scale rollout generation, evaluation, and policy improvement loops. Work closely with research teams to support distributed RL workloads and large-scale model evaluation infrastructure. Improve performance of model execution through kernel-level optimization, model parallelism strategies, and GPU runtime improvements. Develop distributed systems that enable large-scale synthetic data generation and RL-driven training workflows. Diagnose and resolve performance bottlenecks across inference runtimes, GPU kernels, networking, and distributed compute systems.
Copy of Member of Technical Staff - ML Engineering
Deploy, maintain, and optimize production and research compute clusters. Design and implement scalable and efficient ML inference solutions. Develop dynamic and heterogeneous compute solutions for balancing research and production needs. Contribute to productizing model APIs for external use. Develop infrastructure observability and monitoring solutions.
Research Engineer, Machine Learning Systems
The responsibilities include architecting and managing horizontally scalable systems to accelerate the end-to-end training lifecycle for Speech-to-Text (STT) and Text-to-Speech (TTS) models, focusing on optimized data preparation, high-throughput training pipelines, distributed infrastructure, and automated evaluation tooling. The role also involves designing and implementing internal UIs and tools to make ML systems and workflows accessible and transparent to non-technical stakeholders. Additionally, the position requires overseeing and managing training tooling, job orchestration, experiment tracking, and data storage.
Member of Technical Staff - Research Software Engineer
The role involves bridging the gap between research and production by transforming cutting-edge algorithms into scalable training systems. Responsibilities include designing and optimizing large-scale training loops and data pipelines, implementing state-of-the-art techniques ensuring numerical stability and computational efficiency, building internal tooling for launching, monitoring, and reproducing complex experiments, diagnosing deep bottlenecks across the training stack such as GPU memory issues, communication overhead, and dataloader stalls, and translating research prototypes into reusable, production-grade infrastructure. The engineer will architect and optimize the core training infrastructure including RL training loops, distributed GPU systems, and large-scale data pipelines, working closely with researchers to build reliable, scalable systems.
Senior Engineering Manager, ML Platform
The Senior Engineering Manager, ML Platform at Zoox is responsible for developing and executing a strategic vision for the ML training platform to ensure scalability, reliability, and performance for large-scale Foundation and RL models. They lead the design, implementation, and operation of a robust and efficient ML training platform supporting training, experimentation, validation, and monitoring of ML models. They attract, hire, and inspire a diverse world-class engineering team, fostering a culture of innovation, collaboration, and excellence. The role involves close collaboration with cross-functional teams including ML researchers, software engineers, data engineers, and hardware engineers to define requirements and align architectural decisions. The manager also mentors engineers, providing opportunities for career growth through clear and timely feedback.
Software Development in Test Intern
Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference. Implement and maintain changes in high-performance inference engines including kernel backends, speculative decoding, quantization, etc. Profile and optimize performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Design and operate RL and post-training pipelines, jointly optimizing algorithms and systems, and making RL and post-training workloads more efficient with inference-aware training loops. Use these pipelines to train, evaluate, and iterate on frontier models on top of the inference stack. Co-design algorithms and infrastructure so that objectives, rollout collection, and evaluation are tightly coupled to efficient inference, identifying bottlenecks across various layers. Run ablations and scale-up experiments to understand trade-offs between model quality, latency, throughput, and cost, feeding insights back into model, RL, and system design. Profile, debug, and optimize inference and post-training services under real production workloads. Drive roadmap items requiring engine modification including changing kernels, memory layouts, scheduling logic, and APIs. Establish metrics, benchmarks, and experimentation frameworks to validate improvements rigorously. Provide technical leadership by setting technical direction for cross-team efforts, mentoring engineers and researchers on full-stack ML systems work and performance engineering.
Global Hardware Sourcing & Supply Manager
The responsibilities for the Global Hardware Sourcing & Supply Manager role include advancing inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference. The role involves implementing and maintaining changes in high-performance inference engines, profiling and optimizing performance across GPU, networking, and memory layers to improve latency, throughput, and cost. It also requires unifying inference with RL/post-training by designing and operating RL and post-training pipelines and making these workloads more efficient with inference-aware training loops. The role includes training, evaluating, and iterating on frontier models using these pipelines, co-designing algorithms and infrastructure to tightly couple objectives, rollout collection, and evaluation with efficient inference, and quickly identifying bottlenecks across various components. Running ablations and scale-up experiments to understand trade-offs between model quality, latency, throughput, and cost, and owning critical production-scale systems by profiling, debugging, optimizing inference and post-training services are also key responsibilities. The role involves driving roadmap items that require engine modifications, establishing metrics, benchmarks, and experimentation frameworks, and providing technical leadership by setting technical direction for cross-team efforts and mentoring engineers and researchers on full-stack ML systems and performance engineering.
Senior Staff Software Engineer, Model LifeCycle
The Senior Staff Engineer for the Model LifeCycle team at Crusoe is responsible for building a comprehensive managed platform for the entire application development lifecycle with a focus on Machine Learning models including Large Language Models (LLMs). Responsibilities include managing fine-tuning systems for large foundation models such as SFT, PEFT, LoRA, and adapters with multi-node orchestration, checkpointing, failure recovery, and cost-efficient scaling. They implement and maintain end-to-end training pipelines for LLMs, distillation and reinforcement learning pipelines including preference optimization, policy optimization, and reward modeling, as well as manage agent execution infrastructure. They also manage dataset, model, and experiment management tasks including versioning, lineage, evaluation, and reproducible fine-tuning at scale. Additionally, they work closely with product, business, and platform teams to shape core abstractions and APIs, influence architectural decisions around training runtimes, scheduling, storage, and model lifecycle management, contribute to and engage with the open-source LLM ecosystem, and take ownership in designing and building core systems from first principles.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
