ML Infrastructure Engineer Jobs

Discover the latest remote and onsite ML Infrastructure Engineer roles across top active AI companies. Updated hourly.

Join our AI community Interested in Hiring?

Hiring by

Check out 23 new ML Infrastructure Engineer opportunities posted on AI Chopping Block

View detail

Machine Learning Enginer, Core Evaluations

New

Top rated

Cantina Labs

–

Full-time

–

Posted

Apr 28, 2026 23:09

The responsibilities include designing model evaluation pipelines for models in both development and production environments, designing user studies for subjective model evaluations, converting requirements into measurable metrics, and designing and developing automated evaluation dashboards to monitor and compare model performance. It also involves training new models to capture various evaluation metrics, communicating with the model team to help design improved models based on evaluation results, coordinating with the data team to determine necessary data for enhancing model performance, collaborating with the product manager to ensure product requirements are accurately measured, helping to grow the evaluation team as the founding member, and leading the evaluation team in the future.

Undisclosed

()

San Francisco, United States

Maybe global

Remote

View detail

Member of Engineering (Reinforcement Learning Infrastructure)

New

Top rated

Poolside

–

Full-time

–

Posted

Apr 28, 2026 3:25

Keep up with the latest research, and be familiar with the state of the art in LLMs, RL, and code generation. Develop methods for tuning training and inference end-to-end for high throughput. Design data control systems in an RL pipeline that govern what the model sees and when. Debug cases where infrastructure decisions are silently degrading learning dynamics. Build observability tooling that surfaces when a system-level issue is the root cause of a training regression. Help build robust, flexible and scalable RL pipelines. Optimize performance across the stack — networking, memory, compute scheduling, and I/O. Write high-quality, pragmatic code. Work in the team: plan future steps, discuss, and always stay in touch.

Undisclosed

()

United Kingdom

Maybe global

Remote

View detail

Research Infrastructure Engineer, Training Systems

New

Top rated

OpenAI

–

Full-time

–

Posted

Apr 28, 2026 0:06

Build and maintain infrastructure for large-scale model training and experimentation. Design APIs and interfaces to simplify complex training workflows and prevent misuse. Improve reliability, debuggability, and performance of training and data pipelines. Debug issues across technologies including Python, PyTorch, distributed systems, GPUs, networking, and storage. Write tests, benchmarks, and diagnostics to detect significant regressions.

$295,000 – $380,000

Undisclosed

YEAR

(USD)

San Francisco, United States

Maybe global

Remote

View detail

Engineering Manager, AI & Data Infrastructure

New

Top rated

Decagon

–

Full-time

–

Posted

Apr 24, 2026 4:12

The Engineering Manager, AI & Data Infrastructure leads the AI & Data Infrastructure team responsible for the data and inference systems that support agent interactions, including streaming and batch pipelines for analytics and customer telemetry, realtime databases for low-latency behavior, and GPU and model-serving platforms for LLM inference. This role involves building, leading, and developing a high-performing team of data and ML infrastructure engineers through hiring, coaching, and performance management. Responsibilities include owning the technical strategy and roadmap for AI & Data Infrastructure, staying hands-on with design and code reviews, leading architecture for high-throughput data systems and low-latency inference, setting reliability, quality, and cost standards, investing in developer and analyst experience, raising standards on AI-assisted engineering practices, and partnering with Research, Product Engineering, Platform, and customer-facing teams to deliver data and inference capabilities, including enterprise deployments.

$280,000 – $430,000

Undisclosed

YEAR

(USD)

San Francisco, United States

Maybe global

Onsite

View detail

Machine Learning Engineer, API Multicloud

New

Top rated

OpenAI

–

Full-time

–

Posted

Apr 22, 2026 8:59

The role involves partnering with strategic customers and internal teams to define target model behaviors, diagnose failure modes, and translate real-world needs into training, evaluation, and system requirements. The engineer will build and scale production machine learning systems for model customization, post-training, and fine-tuning-as-a-service workflows. Responsibilities include investigating whether training and customization workflows produce the intended outcomes and identifying necessary changes to data, evaluation, training, or infrastructure to improve performance. The engineer will collaborate with backend and infrastructure engineers to integrate ML capabilities into AWS-native API environments and feed learnings from partner deployments back into the platform by proposing and implementing improvements to post-training systems, tooling, APIs, and developer workflows. The role requires close work with Research and Applied teams to bring model improvements, training workflows, and evaluation best practices into production. Designing systems that allow strategic partners and enterprise customers to safely customize OpenAI models for high-value use cases is also a key responsibility. Additionally, the role involves debugging and improving complex systems spanning model behavior, training data, APIs, distributed infrastructure, and customer-facing product surfaces. The engineer must operate with high ownership in a 0 to 1 environment where requirements are ambiguous, systems are evolving quickly, and reliability matters.

$295,000 – $445,000

Undisclosed

YEAR

(USD)

San Francisco, United States

Maybe global

Onsite

View detail

Member of Technical Staff - ML Performance

New

Top rated

Modal

–

Full-time

–

Posted

Apr 21, 2026 7:19

The role involves engineering work focused on making machine learning systems performant at scale. This includes contributing to open-source projects and enhancing Modal's container runtime to improve the throughput and reduce the latency of language and diffusion models.

$150,000 – $350,000

Undisclosed

YEAR

(USD)

New York, United States

Maybe global

Onsite

View detail

Finance Analytics Engineer

New

Top rated

Together AI

–

Full-time

–

Posted

Apr 16, 2026 5:42

Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference. Implement and maintain changes in high-performance inference engines such as SGLang- or vLLM-style systems and Together's inference stack, including kernel backends, speculative decoding like ATLAS, and quantization. Profile and optimize performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Design and operate RL and post-training pipelines with methods such as RLHF, RLAIF, GRPO, DPO-style methods, and reward modeling, optimizing these workloads with inference-aware training loops. Use these pipelines to train, evaluate, and iterate on frontier models on top of the inference stack. Co-design algorithms and infrastructure to tightly couple objectives, rollout collection, and evaluation with efficient inference, identifying bottlenecks across training engines, inference engines, data pipelines, and user-facing layers. Run ablations and scale-up experiments to understand trade-offs among model quality, latency, throughput, and cost and feed insights into the design process. Profile, debug, and optimize inference and post-training services under production workloads. Drive roadmap items requiring engine modification, including changing kernels, memory layouts, scheduling logic, and APIs. Establish metrics, benchmarks, and experimentation frameworks for rigorous validation of improvements. Set technical direction for cross-team efforts at the intersection of inference, RL, and post-training. Mentor other engineers and researchers on full-stack ML systems work and performance engineering.

$200,000 – $280,000

Undisclosed

YEAR

(USD)

San Francisco

Maybe global

Onsite

View detail

Staff Analytics Engineer — Data Warehouse

New

Top rated

Together AI

–

Full-time

–

Posted

Apr 8, 2026 3:58

Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference. Implement and maintain changes in high-performance inference engines, including kernel backends, speculative decoding, and quantization. Profile and optimize performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Design and operate RL and post-training pipelines where most cost is inference, jointly optimizing algorithms and systems. Make RL and post-training workloads more efficient with inference-aware training loops, async RL rollouts, and speculative decoding. Use these pipelines to train, evaluate, and iterate on frontier models. Co-design algorithms and infrastructure to tightly couple objectives, rollout collection, and evaluation with efficient inference, identifying bottlenecks across the training engine, inference engine, data pipeline, and user-facing layers. Run ablations and scale-up experiments to understand trade-offs between model quality, latency, throughput, and cost, feeding insights into model, RL, and system design. Profile, debug, and optimize inference and post-training services under real production workloads. Drive roadmap items requiring engine modification such as changing kernels, memory layouts, scheduling logic, and APIs. Establish metrics, benchmarks, and experimentation frameworks to validate improvements rigorously. Provide technical leadership to set direction for cross-team efforts in inference, RL, and post-training and mentor engineers and researchers on full-stack ML systems work and performance engineering.

$200,000 – $280,000

Undisclosed

YEAR

(USD)

San Francisco

Maybe global

Onsite

View detail

Sr. Partnerships Manager, Model Ecosystem

New

Top rated

Together AI

–

Full-time

–

Posted

Apr 8, 2026 1:22

Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference; implementing and maintaining changes in high-performance inference engines such as SGLang- or vLLM-style systems and Together’s inference stack, including kernel backends, speculative decoding, and quantization; profiling and optimizing performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Unify inference with RL/post-training by designing and operating RL and post-training pipelines such as RLHF, RLAIF, GRPO, and DPO-style methods where most cost is inference, jointly optimizing algorithms and systems; making workloads more efficient with inference-aware training loops, async RL rollouts, and speculative decoding; training, evaluating, and iterating on frontier models; co-designing algorithms and infrastructure to tightly couple objectives, rollout collection, and evaluation to efficient inference; running ablations and scale-up experiments to understand trade-offs and feed insights into model, RL, and system design. Own critical systems at production scale by profiling, debugging, and optimizing inference and post-training services; driving roadmap items involving engine modifications like changing kernels, memory layouts, scheduling logic, and APIs; establishing metrics, benchmarks, and experimentation frameworks to validate improvements rigorously. Provide technical leadership by setting technical direction for cross-team efforts at the intersection of inference, RL, and post-training; mentoring engineers and researchers on full-stack ML systems work and performance engineering.

$200,000 – $280,000

Undisclosed

YEAR

(USD)

San Francisco

Maybe global

Onsite

View detail

Lead/Manager Site Reliability Engineering Team (Amsterdam)

New

Top rated

Together AI

–

Full-time

–

Posted

Apr 1, 2026 15:22

Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference. Implement and maintain changes in high-performance inference engines such as SGLang- or vLLM-style systems and Together's inference stack, including kernel backends, speculative decoding methods like ATLAS, and quantization. Profile and optimize performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Unify inference with RL/post-training by designing and operating RL and post-training pipelines where inference constitutes the majority of the cost, optimizing algorithms and systems jointly. Enhance RL and post-training workloads with inference-aware training loops, including asynchronous RL rollouts and speculative decoding techniques, making large-scale rollout collection and evaluation more efficient. Use these pipelines to train, evaluate, and iterate on cutting-edge models based on the inference stack. Co-design algorithms and infrastructure to tightly couple objectives, rollout collection, and evaluation to efficient inference, and identify bottlenecks across training engines, inference engines, data pipelines, and user-facing layers quickly. Run ablation and scale-up experiments to analyze trade-offs between model quality, latency, throughput, and cost, feeding insights back into model, RL, and system design. Own critical production-scale systems by profiling, debugging, and optimizing inference and post-training services under real production workloads. Lead roadmap initiatives necessitating engine modifications such as changes to kernels, memory layouts, scheduling logic, and APIs. Establish metrics, benchmarks, and experimentation frameworks to rigorously validate improvements. Provide technical leadership by setting direction for cross-team efforts at the intersection of inference, RL, and post-training and mentor engineers and researchers on full-stack ML systems work and performance engineering.

$200,000 – $280,000

Undisclosed

YEAR

(USD)

Amsterdam

Maybe global

Onsite

Want to see more ML Infrastructure Engineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.

Join our community

(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Have questions about roles, locations, or requirements for ML Infrastructure Engineer jobs?

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What does a ML Infrastructure Engineer do?","answer":"ML Infrastructure Engineers design, build, and maintain systems that support machine learning workflows from development to production. They create scalable platforms for model training and serving, implement distributed training systems, and develop monitoring solutions to track model performance. These engineers also build data pipelines, optimize ML systems for performance, and implement automated testing and deployment processes while collaborating with data scientists and researchers to productionize ML models."},{"question":"What skills are required for ML Infrastructure Engineer?","answer":"ML Infrastructure Engineers need strong programming skills in Python and sometimes Go, Rust, or C++. Proficiency with ML frameworks like PyTorch and TensorFlow is essential, alongside expertise in cloud platforms (AWS, GCP), containers (Docker), and orchestration (Kubernetes). They should understand distributed systems, data engineering concepts, and model serving techniques. Experience with infrastructure-as-code tools and monitoring systems rounds out the technical requirements, complemented by problem-solving abilities and collaboration skills."},{"question":"What qualifications are needed for ML Infrastructure Engineer role?","answer":"Most ML Infrastructure Engineer positions require a Bachelor's or Master's degree in Computer Science or related field, plus 4-5+ years of experience building production ML systems. Employers typically expect demonstrable experience with cloud platforms, containerization tools, and ML frameworks. Strong understanding of system-level software, machine learning concepts, and resource utilization is necessary. Experience with distributed systems and high-throughput workloads is highly valued, especially for senior positions."},{"question":"What is the salary range for ML Infrastructure Engineer job?","answer":"The research provided doesn't specify salary ranges for ML Infrastructure Engineer jobs. Compensation typically varies based on factors like location, company size, experience level, and specific technical expertise. Organizations like Anthropic, Scale AI, Apple, and other technology companies actively hiring for these positions likely offer competitive compensation packages reflecting the specialized nature of ML infrastructure skills and the current market demand."},{"question":"How long does it take to get hired as a ML Infrastructure Engineer?","answer":"The hiring timeline for ML Infrastructure Engineer positions isn't specified in the provided research. The process typically includes technical interviews focused on systems design, ML fundamentals, and programming skills. Given the specialized nature of the role, companies often conduct thorough evaluations of candidates' experience with production ML systems, distributed computing, and relevant technologies. The specialized requirements may extend the hiring process compared to more general engineering roles."},{"question":"Are ML Infrastructure Engineer job in demand?","answer":"Yes, ML Infrastructure Engineer jobs show strong demand based on active openings at major companies like DataXight, Scale AI, Anthropic, Apple, and Character.AI. The field is growing particularly in specialized areas such as LLM serving infrastructure, on-device ML optimization, and safety-critical ML systems. These positions are distributed across major tech hubs with opportunities ranging from mid-level to senior roles, reflecting industry's increasing need for engineers who can build reliable ML systems at scale."}]

Find AI jobs in by countries

AI Jobs in Argentina

AI Jobs in Australia

AI Jobs in Brazil

AI Jobs in Canada

AI Jobs in China

AI Jobs in France

AI Jobs in Germany

AI Jobs in Hong Kong

AI Jobs in India

AI Jobs in Japan

AI Jobs in Mexico

AI Jobs in Poland

AI Jobs in Singapore

AI Jobs in South Korea

AI Jobs in Spain

AI Jobs in Sweden

AI Jobs in Taiwan

AI Jobs in United Kingdom

AI Jobs in United States

Find AI jobs for similar categories

AI Technical Trainer Jobs

Applied ML Engineer Jobs

AI Full-Stack Engineer Jobs

AI MLOps Engineer Jobs

AI Systems Engineer Jobs

AI Applied Data Scientist Jobs

AI Robotics Software Engineer Jobs

AI Developer Educator Jobs

Applied AI Engineer Jobs

AI Applied Research Scientist Jobs

ML Researcher Jobs

AI Backend Engineer Jobs

AI Training Data Specialist Jobs

ML Research Engineer Jobs

AI Infrastructure Engineer Jobs

AI Autonomous Systems Engineer Jobs

AI Platform Engineer Jobs

AI Perception Engineer Jobs

AI Robotics Researcher Jobs