ML Infrastructure Engineer Jobs

Discover the latest remote and onsite ML Infrastructure Engineer roles across top active AI companies. Updated hourly.

Check out 23 new ML Infrastructure Engineer opportunities posted on AI Chopping Block

Staff Analytics Engineer — Data Warehouse

New
Top rated
Together AI
Full-time
Full-time
Posted

Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference. Implement and maintain changes in high-performance inference engines, including kernel backends, speculative decoding, and quantization. Profile and optimize performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Design and operate RL and post-training pipelines where most cost is inference, jointly optimizing algorithms and systems. Make RL and post-training workloads more efficient with inference-aware training loops, async RL rollouts, and speculative decoding. Use these pipelines to train, evaluate, and iterate on frontier models. Co-design algorithms and infrastructure to tightly couple objectives, rollout collection, and evaluation with efficient inference, identifying bottlenecks across the training engine, inference engine, data pipeline, and user-facing layers. Run ablations and scale-up experiments to understand trade-offs between model quality, latency, throughput, and cost, feeding insights into model, RL, and system design. Profile, debug, and optimize inference and post-training services under real production workloads. Drive roadmap items requiring engine modification such as changing kernels, memory layouts, scheduling logic, and APIs. Establish metrics, benchmarks, and experimentation frameworks to validate improvements rigorously. Provide technical leadership to set direction for cross-team efforts in inference, RL, and post-training and mentor engineers and researchers on full-stack ML systems work and performance engineering.

$200,000 – $280,000
Undisclosed
YEAR

(USD)

San Francisco
Maybe global
Onsite

Sr. Partnerships Manager, Model Ecosystem

New
Top rated
Together AI
Full-time
Full-time
Posted

Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference; implementing and maintaining changes in high-performance inference engines such as SGLang- or vLLM-style systems and Together’s inference stack, including kernel backends, speculative decoding, and quantization; profiling and optimizing performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Unify inference with RL/post-training by designing and operating RL and post-training pipelines such as RLHF, RLAIF, GRPO, and DPO-style methods where most cost is inference, jointly optimizing algorithms and systems; making workloads more efficient with inference-aware training loops, async RL rollouts, and speculative decoding; training, evaluating, and iterating on frontier models; co-designing algorithms and infrastructure to tightly couple objectives, rollout collection, and evaluation to efficient inference; running ablations and scale-up experiments to understand trade-offs and feed insights into model, RL, and system design. Own critical systems at production scale by profiling, debugging, and optimizing inference and post-training services; driving roadmap items involving engine modifications like changing kernels, memory layouts, scheduling logic, and APIs; establishing metrics, benchmarks, and experimentation frameworks to validate improvements rigorously. Provide technical leadership by setting technical direction for cross-team efforts at the intersection of inference, RL, and post-training; mentoring engineers and researchers on full-stack ML systems work and performance engineering.

$200,000 – $280,000
Undisclosed
YEAR

(USD)

San Francisco
Maybe global
Onsite

Lead/Manager Site Reliability Engineering Team (Amsterdam)

New
Top rated
Together AI
Full-time
Full-time
Posted

Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference. Implement and maintain changes in high-performance inference engines such as SGLang- or vLLM-style systems and Together's inference stack, including kernel backends, speculative decoding methods like ATLAS, and quantization. Profile and optimize performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Unify inference with RL/post-training by designing and operating RL and post-training pipelines where inference constitutes the majority of the cost, optimizing algorithms and systems jointly. Enhance RL and post-training workloads with inference-aware training loops, including asynchronous RL rollouts and speculative decoding techniques, making large-scale rollout collection and evaluation more efficient. Use these pipelines to train, evaluate, and iterate on cutting-edge models based on the inference stack. Co-design algorithms and infrastructure to tightly couple objectives, rollout collection, and evaluation to efficient inference, and identify bottlenecks across training engines, inference engines, data pipelines, and user-facing layers quickly. Run ablation and scale-up experiments to analyze trade-offs between model quality, latency, throughput, and cost, feeding insights back into model, RL, and system design. Own critical production-scale systems by profiling, debugging, and optimizing inference and post-training services under real production workloads. Lead roadmap initiatives necessitating engine modifications such as changes to kernels, memory layouts, scheduling logic, and APIs. Establish metrics, benchmarks, and experimentation frameworks to rigorously validate improvements. Provide technical leadership by setting direction for cross-team efforts at the intersection of inference, RL, and post-training and mentor engineers and researchers on full-stack ML systems work and performance engineering.

$200,000 – $280,000
Undisclosed
YEAR

(USD)

Amsterdam
Maybe global
Onsite

Senior Machine Learning Engineer, Voice AI

New
Top rated
Together AI
Full-time
Full-time
Posted

Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference; implement and maintain changes in high-performance inference engines including kernel backends, speculative decoding, and quantization; profile and optimize performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Unify inference with RL/post-training by designing and operating RL and post-training pipelines, optimizing algorithms and systems where most costs are inference, and making RL and post-training workloads more efficient with inference-aware training loops. Use these pipelines to train, evaluate, and iterate on frontier models on the inference stack. Co-design algorithms and infrastructure to tightly couple objectives, rollout collection, and evaluation to efficient inference, identifying bottlenecks in the training engine, inference engine, data pipeline, and user-facing layers. Run ablations and scale-up experiments to understand trade-offs between model quality, latency, throughput, and cost, feeding insights back into model, RL, and system design. Own critical systems at production scale by profiling, debugging, and optimizing inference and post-training services under real production workloads and driving roadmap items requiring engine modifications including kernels, memory layouts, and scheduling logic. Establish metrics, benchmarks, and experimentation frameworks to rigorously validate improvements. Provide technical leadership by setting technical direction for cross-team efforts at the intersection of inference, RL, and post-training, and mentoring engineers and researchers in full-stack ML systems work and performance engineering.

$200,000 – $280,000
Undisclosed
YEAR

(USD)

San Francisco
Maybe global
Onsite

Senior Software Engineer, ML Infrastructure

New
Top rated
Decagon
Full-time
Full-time
Posted

Design and build distributed training platforms for large language models and multimodal fine-tuning and post-training at scale. Integrate state-of-the-art training algorithms into production pipelines. Own inference architecture and multi-provider routing, including failover and optimization. Lead initiatives to improve latency and cost efficiency across the training and serving stack. Build evaluation and experimentation infrastructure that enables rapid and reliable iteration. Drive technical direction, mentor engineers, and establish best practices for ML infrastructure.

$250,000 – $330,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite

Member of Technical Staff - Mid-Training Infra

New
Top rated
Reflection
Full-time
Full-time
Posted

Design, build, and operate large-scale GPU infrastructure for high-throughput model inference and mid-training workloads. Develop systems that power synthetic data generation and reinforcement learning pipelines at scale. Build high-performance inference platforms capable of serving and evaluating models across thousands of GPUs. Optimize throughput, latency, and GPU utilization for large language model inference and rollout workloads. Build infrastructure that supports reinforcement learning pipelines, including large-scale rollout generation, evaluation, and policy improvement loops. Work closely with research teams to support distributed RL workloads and large-scale model evaluation infrastructure. Improve performance of model execution through kernel-level optimization, model parallelism strategies, and GPU runtime improvements. Develop distributed systems that enable large-scale synthetic data generation and RL-driven training workflows. Diagnose and resolve performance bottlenecks across inference runtimes, GPU kernels, networking, and distributed compute systems.

Undisclosed

()

San Francisco, United States
Maybe global
Onsite

Copy of Member of Technical Staff - ML Engineering

New
Top rated
Talent Labs
Full-time
Full-time
Posted

Deploy, maintain, and optimize production and research compute clusters. Design and implement scalable and efficient ML inference solutions. Develop dynamic and heterogeneous compute solutions for balancing research and production needs. Contribute to productizing model APIs for external use. Develop infrastructure observability and monitoring solutions.

Undisclosed

()

London, United Kingdom
Maybe global
Remote

Research Engineer, Machine Learning Systems

New
Top rated
Deepgram
Full-time
Full-time
Posted

The responsibilities include architecting and managing horizontally scalable systems to accelerate the end-to-end training lifecycle for Speech-to-Text (STT) and Text-to-Speech (TTS) models, focusing on optimized data preparation, high-throughput training pipelines, distributed infrastructure, and automated evaluation tooling. The role also involves designing and implementing internal UIs and tools to make ML systems and workflows accessible and transparent to non-technical stakeholders. Additionally, the position requires overseeing and managing training tooling, job orchestration, experiment tracking, and data storage.

$150,000 – $250,000
Undisclosed
YEAR

(USD)

United States
Maybe global
Remote

Member of Technical Staff - Research Software Engineer

New
Top rated
Reflection
Full-time
Full-time
Posted

The role involves bridging the gap between research and production by transforming cutting-edge algorithms into scalable training systems. Responsibilities include designing and optimizing large-scale training loops and data pipelines, implementing state-of-the-art techniques ensuring numerical stability and computational efficiency, building internal tooling for launching, monitoring, and reproducing complex experiments, diagnosing deep bottlenecks across the training stack such as GPU memory issues, communication overhead, and dataloader stalls, and translating research prototypes into reusable, production-grade infrastructure. The engineer will architect and optimize the core training infrastructure including RL training loops, distributed GPU systems, and large-scale data pipelines, working closely with researchers to build reliable, scalable systems.

Undisclosed

()

New York City, United States
Maybe global
Onsite

Senior Engineering Manager, ML Platform

New
Top rated
Zoox
Full-time
Full-time
Posted

The Senior Engineering Manager, ML Platform at Zoox is responsible for developing and executing a strategic vision for the ML training platform to ensure scalability, reliability, and performance for large-scale Foundation and RL models. They lead the design, implementation, and operation of a robust and efficient ML training platform supporting training, experimentation, validation, and monitoring of ML models. They attract, hire, and inspire a diverse world-class engineering team, fostering a culture of innovation, collaboration, and excellence. The role involves close collaboration with cross-functional teams including ML researchers, software engineers, data engineers, and hardware engineers to define requirements and align architectural decisions. The manager also mentors engineers, providing opportunities for career growth through clear and timely feedback.

$317,000 – $370,000
Undisclosed
YEAR

(USD)

Foster City, United States
Maybe global
Onsite

Want to see more ML Infrastructure Engineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.
(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Have questions about roles, locations, or requirements for ML Infrastructure Engineer jobs?

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What does a ML Infrastructure Engineer do?","answer":"ML Infrastructure Engineers design, build, and maintain systems that support machine learning workflows from development to production. They create scalable platforms for model training and serving, implement distributed training systems, and develop monitoring solutions to track model performance. These engineers also build data pipelines, optimize ML systems for performance, and implement automated testing and deployment processes while collaborating with data scientists and researchers to productionize ML models."},{"question":"What skills are required for ML Infrastructure Engineer?","answer":"ML Infrastructure Engineers need strong programming skills in Python and sometimes Go, Rust, or C++. Proficiency with ML frameworks like PyTorch and TensorFlow is essential, alongside expertise in cloud platforms (AWS, GCP), containers (Docker), and orchestration (Kubernetes). They should understand distributed systems, data engineering concepts, and model serving techniques. Experience with infrastructure-as-code tools and monitoring systems rounds out the technical requirements, complemented by problem-solving abilities and collaboration skills."},{"question":"What qualifications are needed for ML Infrastructure Engineer role?","answer":"Most ML Infrastructure Engineer positions require a Bachelor's or Master's degree in Computer Science or related field, plus 4-5+ years of experience building production ML systems. Employers typically expect demonstrable experience with cloud platforms, containerization tools, and ML frameworks. Strong understanding of system-level software, machine learning concepts, and resource utilization is necessary. Experience with distributed systems and high-throughput workloads is highly valued, especially for senior positions."},{"question":"What is the salary range for ML Infrastructure Engineer job?","answer":"The research provided doesn't specify salary ranges for ML Infrastructure Engineer jobs. Compensation typically varies based on factors like location, company size, experience level, and specific technical expertise. Organizations like Anthropic, Scale AI, Apple, and other technology companies actively hiring for these positions likely offer competitive compensation packages reflecting the specialized nature of ML infrastructure skills and the current market demand."},{"question":"How long does it take to get hired as a ML Infrastructure Engineer?","answer":"The hiring timeline for ML Infrastructure Engineer positions isn't specified in the provided research. The process typically includes technical interviews focused on systems design, ML fundamentals, and programming skills. Given the specialized nature of the role, companies often conduct thorough evaluations of candidates' experience with production ML systems, distributed computing, and relevant technologies. The specialized requirements may extend the hiring process compared to more general engineering roles."},{"question":"Are ML Infrastructure Engineer job in demand?","answer":"Yes, ML Infrastructure Engineer jobs show strong demand based on active openings at major companies like DataXight, Scale AI, Anthropic, Apple, and Character.AI. The field is growing particularly in specialized areas such as LLM serving infrastructure, on-device ML optimization, and safety-critical ML systems. These positions are distributed across major tech hubs with opportunities ranging from mid-level to senior roles, reflecting industry's increasing need for engineers who can build reliable ML systems at scale."}]