AI DevOps Engineer Jobs

Discover the latest remote and onsite AI DevOps Engineer roles across top active AI companies. Updated hourly.

Join our AI community Interested in Hiring?

Hiring by

Check out 932 new AI DevOps Engineer opportunities posted on AI Chopping Block

View detail

VP of Engineering

New

Top rated

Hyperbolic

–

Full-time

–

Posted

Jun 13, 2026 3:14

Lead the design and evolution of the AI cloud platform including GPU orchestration, compute scheduling, networking, storage, and distributed systems. Make critical decisions regarding cloud infrastructure, bare-metal deployments, and platform scalability. Participate personally in architecture reviews and key technical initiatives. Build and scale large GPU clusters supporting customer workloads and design systems for GPU provisioning, scheduling, utilization optimization, and capacity management. Drive platform reliability and performance for AI training and inference workloads, partnering closely with engineering teams on infrastructure requirements for next-generation AI systems. Remain deeply involved in engineering decisions and technical direction, contribute directly to infrastructure design and implementation efforts, review architecture proposals, system designs, and major infrastructure changes, and act as the technical escalation point for complex infrastructure challenges. Establish best practices for Kubernetes, observability, CI/CD, security, and operational excellence. Build SRE and Platform Engineering functions from the ground up. Define reliability standards including SLOs, SLIs, incident response processes, and capacity planning. Drive automation across infrastructure operations. Recruit and develop Infrastructure, Platform, and SRE teams. Build a high-performance engineering culture focused on ownership and execution. Partner with executive leadership on company strategy and infrastructure investments. Manage infrastructure budgets, vendor relationships, and capacity planning.

Undisclosed

()

San Francisco, United States

Maybe global

Remote

View detail

Systems Research Engineer Intern - GPU Programming (Fall 2026)

New

Top rated

Together AI

–

Full-time

–

Posted

Jun 13, 2026 0:13

Participate in on-call rotation (Pagerduty) to respond to production incidents. Build and run infrastructure with Ansible, Terraform, and Kubernetes to enable scaling to a large number of concurrent users. Build monitoring systems to ensure the highest quality service for customers. Design and implement operational processes such as deployments and upgrades. Debug production issues across all services and levels of the stack. Identify improvements for the product architecture from the perspectives of reliability, performance, and availability. Plan the growth of Together AI's infrastructure.

$190,000 – $270,000

Undisclosed

YEAR

(USD)

San Francisco

Maybe global

Onsite

View detail

Frontier Agents Intern (Fall 2026)

New

Top rated

Together AI

–

Full-time

–

Posted

Jun 13, 2026 0:13

As an AI Infrastructure Engineer at Together AI, the responsibilities include participating in on-call rotation (Pagerduty) to respond to production incidents; building and running infrastructure with Ansible, Terraform, and Kubernetes to enable scaling for a massive number of concurrent users; building monitoring systems to ensure the highest quality service for customers; designing and implementing operational processes such as deployments and upgrades; debugging production issues across all services and levels of the stack; identifying improvements for the product architecture from reliability, performance, and availability perspectives; and planning the growth of Together AI's infrastructure.

$190,000 – $270,000

Undisclosed

YEAR

(USD)

San Francisco, United States

Maybe global

Onsite

View detail

IT Systems Engineer

New

Top rated

Scale AI

–

Full-time

–

Posted

Jun 11, 2026 1:42

As a Production AI Ops Lead, you will design and develop the production lifecycle of full-stack AI applications, support end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and the resilient cloud infrastructure for international government partners. You will own the production outcome, taking full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies. You will ensure full-stack integrity by overseeing the end-to-end health of the platform and maintaining a responsive and production-ready environment. You will build automated systems to monitor model performance and data drift across geographically dispersed environments to ensure appropriate reliability. You will manage the technical lifecycle within diverse regulatory frameworks, lead the response for production issues in mission-critical environments, ensure rapid resolution, and build guardrails to prevent recurrence. You will translate deep technical performance metrics into clear insights for senior international government officials. Additionally, you will partner with Engineering and ML teams to incorporate lessons learned in the field into the technical architecture and decisions for future use cases.

Undisclosed

()

Washington, United States

Maybe global

Onsite

View detail

Staff Engineer, Distributed Storage and HPC & AI Infrastructure

New

Top rated

Together AI

–

Full-time

–

Posted

Jun 4, 2026 23:37

As an AI Infrastructure Engineer, the responsibilities include participating in an on-call rotation to respond to production incidents, building and running infrastructure using Ansible, Terraform, and Kubernetes to enable scaling for many concurrent users, building monitoring systems to ensure high-quality service, designing and implementing operational processes such as deployments and upgrades, debugging production issues across all services and stack levels, identifying improvements for product architecture concerning reliability, performance, and availability, and planning the growth of Together AI's infrastructure.

$190,000 – $270,000

Undisclosed

YEAR

(USD)

San Francisco

Maybe global

Onsite

View detail

Manager, Infrastructure Strategy & Operations

New

Top rated

Together AI

–

Full-time

–

Posted

Jun 3, 2026 2:35

As an AI Infrastructure Engineer at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You participate in on-call rotation (Pagerduty) to respond to production incidents. You build and run infrastructure with Ansible, Terraform, and Kubernetes to enable scaling to a massive number of concurrent users. You build monitoring systems to ensure the highest quality service for customers. You design and implement operational processes such as deployments and upgrades. You debug production issues across all services and levels of the stack. You identify improvements for the product architecture from the reliability, performance, and availability perspectives. You plan the growth of Together AI's infrastructure.

$190,000 – $270,000

Undisclosed

YEAR

(USD)

San Francisco

Maybe global

Onsite

View detail

Lead/Manager Together Cloud Infrastructure Engineer

New

Top rated

Together AI

–

Full-time

–

Posted

Jun 2, 2026 17:05

As an AI Infrastructure Engineer at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You participate in on-call rotation to respond to production incidents, build and run infrastructure using Ansible, Terraform, and Kubernetes to enable scaling to a massive number of concurrent users, build monitoring systems to ensure the highest quality service for customers, design and implement operational processes such as deployments and upgrades, debug production issues across all services and levels of the stack, identify improvements for product architecture from reliability, performance, and availability perspectives, and plan the growth of Together AI's infrastructure.

$190,000 – $270,000

Undisclosed

YEAR

(USD)

Amsterdam

Maybe global

Onsite

View detail

Staff Platform Engineer, Voice AI

New

Top rated

Together AI

–

Full-time

–

Posted

May 20, 2026 2:32

As an AI Infrastructure Engineer at Together, you are responsible for keeping all user-facing services and production systems running smoothly by participating in on-call rotation to respond to production incidents, building and running infrastructure with Ansible, Terraform, and Kubernetes to enable scaling for a massive number of concurrent users, building monitoring systems to ensure the highest quality service, designing and implementing operational processes such as deployments and upgrades, debugging production issues across all services and levels of the stack, identifying improvements for product architecture from reliability, performance, and availability perspectives, and planning the growth of Together AI's infrastructure.

$190,000 – $270,000

Undisclosed

YEAR

(USD)

San Francisco

Maybe global

Onsite

View detail

Infrastructure Design Engineer

New

Top rated

Together AI

–

Full-time

–

Posted

May 14, 2026 5:11

As an AI Infrastructure Engineer at Together, you are responsible for keeping all user-facing services and production systems running smoothly. Your tasks include participating in an on-call rotation to respond to production incidents, building and running infrastructure with Ansible, Terraform, and Kubernetes to enable scaling to a massive number of concurrent users, building monitoring systems to ensure the highest quality service, designing and implementing operational processes such as deployments and upgrades, debugging production issues across all services and levels of the stack, identifying improvements for the product architecture from reliability, performance, and availability perspectives, and planning the growth of Together AI's infrastructure.

$190,000 – $270,000

Undisclosed

YEAR

(USD)

San Francisco

Maybe global

Onsite

View detail

Business Development Intern

New

Top rated

PathAI

–

Full-time

–

Posted

May 2, 2026 0:06

Lead the team responsible for the AI/ML infrastructure that connects machine learning research with large-scale production. Develop and execute the long-term vision and roadmap for the MLOps team to support ML development and deployment needs across business units, balancing short-term tactical deliveries and long-term architectural transformation. Manage and mentor a team of 6-7+ engineers, allocating resources strategically for existing service support and key initiatives. Collaborate cross-functionally with leaders in machine learning, data science, product engineering, and infrastructure to identify issues, address bottlenecks, and facilitate new solution deployment. Architect compute and storage pipelines for managing large datasets without data fragmentation or latency. Modernize inference stack for AI product growth. Work with Site Reliability Engineering to establish comprehensive system metrics. Conduct build vs. buy assessments and audits to benchmark proprietary tools against commercial and open-source alternatives.

$181,500 – $278,300

Undisclosed

YEAR

(USD)

Boston

Maybe global

Remote

Want to see more AI DevOps Engineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.

Join our community

(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Have questions about roles, locations, or requirements for AI DevOps Engineer jobs?

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What does an AI DevOps Engineer do?","answer":"AI DevOps Engineers build and maintain ML pipelines in cloud environments, implementing CI/CD workflows specifically for AI applications. They create monitoring solutions that track not just system health but also data quality and model performance. Their daily work includes developing cloud infrastructure code using tools like Terraform and Ansible, ensuring AI applications scale effectively. They collaborate with data scientists to deploy models, troubleshoot production issues, and implement security protocols. Unlike traditional developers, they bridge the gap between data science and operations, ensuring ML models transition smoothly from development to production environments."},{"question":"What skills are required for AI DevOps Engineer jobs?","answer":"AI DevOps Engineers need strong cloud platform expertise, particularly in AWS, Azure, or GCP. Proficiency with infrastructure-as-code tools like Terraform and Ansible is essential. Container orchestration skills using Docker and Kubernetes help manage AI workloads. Experience with CI/CD pipelines through Jenkins or GitLab CI enables automated model deployment. Python scripting ability supports both automation and ML pipeline integration. Monitoring skills using Prometheus and Grafana help track model performance. Beyond technical abilities, these roles require collaboration skills to work effectively with data scientists and developers, plus problem-solving aptitude to troubleshoot complex AI system issues."},{"question":"What qualifications are needed for AI DevOps Engineer jobs?","answer":"Most AI DevOps Engineer positions require a minimum of 3 years of software development experience and 2+ years of cloud deployment experience, with Azure often preferred. A computer science or related degree is typically expected, though equivalent experience may substitute. Employers look for candidates with hands-on experience using development and deployment tools like GitLab and Atlassian suite products. While not always mandatory, certifications in cloud platforms (AWS Solutions Architect, Azure DevOps Engineer) and container orchestration (CKA) strengthen applications. Experience building CI/CD pipelines specifically for ML workflows gives candidates a significant advantage in the hiring process."},{"question":"What is the salary range for AI DevOps Engineer jobs?","answer":"AI DevOps Engineer salaries vary based on several key factors. Geographic location significantly impacts compensation, with tech hubs like San Francisco and New York offering higher wages. Experience level creates substantial differences, with senior engineers earning considerably more. Specialized expertise in high-demand tools like Kubernetes or specific cloud platforms (AWS/Azure/GCP) can boost earnings. Industry sector also matters—financial services and healthcare organizations often pay premium rates for AI infrastructure expertise. Company size influences packages too, with large enterprises typically offering better benefits but startups potentially providing equity. Security clearances for sensitive projects may command additional compensation."},{"question":"How long does it take to get hired as an AI DevOps Engineer?","answer":"The hiring timeline for AI DevOps Engineers typically ranges from 4-8 weeks. The process usually begins with a screening call, followed by technical assessments testing cloud infrastructure skills and coding abilities. Candidates often face 2-3 rounds of interviews, including sessions with engineering managers and team members. Many employers include practical challenges related to containerization, CI/CD pipeline setup, or infrastructure-as-code implementations. Companies hiring for specialized AI infrastructure may extend the process with additional technical evaluations. Candidates with demonstrated experience in both DevOps and machine learning environments generally move through the pipeline faster than those from only traditional DevOps backgrounds."},{"question":"Are AI DevOps Engineer jobs in demand?","answer":"AI DevOps Engineer roles show strong demand as organizations integrate machine learning into their product offerings. Major companies like Boeing actively recruit for these positions to support AI applications in secure cloud environments. The specialized skillset—combining traditional DevOps practices with ML pipeline expertise—creates a smaller talent pool than for general DevOps roles. Organizations increasingly recognize that successful AI deployment requires specialized infrastructure and monitoring beyond conventional applications. This demand spans industries from technology and finance to manufacturing and healthcare, as each sector adopts AI capabilities requiring robust deployment pipelines, monitoring solutions, and infrastructure that traditional DevOps approaches don't fully address."},{"question":"What is the difference between AI DevOps Engineer and Traditional DevOps Engineer?","answer":"Traditional DevOps Engineers focus on application delivery pipelines, infrastructure automation, and system monitoring for conventional software. AI DevOps Engineers extend these skills to handle machine learning workflows, requiring specialized knowledge of model deployment, training pipelines, and experiment tracking. While both roles use similar tools (Docker, Kubernetes, CI/CD platforms), AI DevOps Engineers must understand data quality monitoring and model performance metrics that don't exist in traditional applications. They work more closely with data scientists and ML engineers, bridging the gap between data science and operations. AI DevOps requires additional considerations around computational resources, GPU scheduling, and optimizing infrastructure for machine learning workloads."}]