AI DevOps Engineer Jobs

Discover the latest remote and onsite AI DevOps Engineer roles across top active AI companies. Updated hourly.

Check out 1001 new AI DevOps Engineer opportunities posted on The Homebase

DevOps Engineer, Infrastructure & Security

New
Top rated
Scale AI
Full-time
Full-time
Posted

The role involves taking full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies. Responsibilities include overseeing the end-to-end health of the platform to ensure seamless integration between the AI core and all full-stack components, from APIs to UI, maintaining a responsive and production-ready environment. The job also requires building automated systems to monitor model performance and data drift across geographically dispersed environments, managing the technical lifecycle within diverse regulatory frameworks, leading the response for production issues in mission-critical environments, ensuring rapid resolution and prevention of future issues. Additionally, the role requires translating deep technical performance metrics into clear insights for senior international government officials and partnering with Engineering and ML teams to ensure lessons learned in the field influence the technical architecture and decisions of future use cases.

Undisclosed

()

San Francisco or New York, United States
Maybe global
Onsite

Senior Pathologist

New
Top rated
PathAI
Full-time
Full-time
Posted

Lead the team responsible for the infrastructure supporting AI/ML Stack, focusing on scalability and efficiency of the Machine Learning Operations platform. Develop and execute the long-term vision and roadmap for the MLOps team to support ML development and deployment across business units, balancing short-term tactical deliveries with long-term architectural transformation. Manage and mentor a team of 6-7+ engineers, allocating resources strategically to support existing services and execute key strategic initiatives. Collaborate cross-functionally with leaders in machine learning, data science, product engineering, and infrastructure to identify pain points, remove bottlenecks, and facilitate new solution deployment. Architect compute and storage pipelines for ML Engineers to manage large datasets and artifacts efficiently. Modernize the AI product inference stack for significant growth in global deployments. Work with Site Reliability Engineering to establish comprehensive system observability metrics. Conduct assessments for technology refresh and benchmark proprietary tools against commercial and open-source alternatives to meet future needs.

$181,500 – $278,300
Undisclosed
YEAR

(USD)

Boston or Memphis, United States
Maybe global
Hybrid

Infrastructure Engineer

New
Top rated
Faculty
Full-time
Full-time
Posted

The Infrastructure Engineer is responsible for building robust, secure, and scalable cloud infrastructure to support AI and machine learning workflows. This includes designing, building, and deploying cloud infrastructure, partnering with technical and non-technical stakeholders from idea generation through implementation and shipping, enabling Machine Learning Engineers and Data Scientists by contributing to internal best practices, standards, and reusable code repositories, proactively identifying and recommending ways customers can leverage cloud infrastructure to solve key challenges, creating and maintaining reusable, company-wide libraries and infrastructure-as-code, and researching and integrating the best open-source technologies to enhance Faculty's infrastructure capabilities.

Undisclosed

()

London, United Kingdom
Maybe global
Hybrid

Staff DevOps Engineer

New
Top rated
webAI
Full-time
Full-time
Posted

The Staff DevOps Engineer will design and architect secure, scalable cloud and edge infrastructure for deploying AI workloads across multi-cloud and hybrid environments. They will build and maintain production-grade Infrastructure as Code using tools like Terraform, Ansible, or Pulumi, managing over 100 resources with GitOps workflows and automated validation. The role includes designing and operating production Kubernetes clusters optimized for AI/ML workloads with GPU support, implementing container security, multi-tenancy, and resource optimization. They will implement secure CI/CD pipelines with integrated security controls and automated deployment workflows for containerized AI models. The engineer will lead MLOps infrastructure initiatives including model deployment pipelines, versioning, feature stores, experiment tracking, and monitoring for model performance and drift. Responsibilities also include designing comprehensive observability and monitoring solutions using tools like Prometheus, Grafana, ELK, or Datadog with distributed tracing, application performance monitoring, and real-time alerting. They will implement security best practices such as least-privilege access, encryption at rest and in transit, network segmentation, and automated compliance validation. The engineer will lead incident response and reliability initiatives, participate in on-call rotation, conduct post-mortems, and drive continuous improvement for system reliability. Architecting disaster recovery and business continuity strategies with automated backup, failover, and recovery processes is required. They will develop reusable infrastructure modules and templates to accelerate environment provisioning and standardize deployment patterns. Mentoring mid-level and senior engineers on cloud architecture, DevOps best practices, and platform reliability through design reviews and technical guidance is part of the role. They will also drive technical documentation and knowledge sharing including runbooks, architecture decision records, and infrastructure standards.

Undisclosed

()

Austin, United States
Maybe global
Onsite

Site Reliability Engineer, Inference Infrastructure

New
Top rated
Cohere
Full-time
Full-time
Posted

As a Site Reliability Engineer on the Model Serving team, you will build self-service systems that automate managing, deploying, and operating services, including custom Kubernetes operators supporting language model deployments. You will automate environment observability and resilience, enabling all developers to troubleshoot and resolve problems, and take steps to ensure defined SLOs are met, including participating in an on-call rotation. Additionally, you will build strong relationships with internal developers and influence the Infrastructure team’s roadmap based on their feedback, as well as develop the team through knowledge sharing and an active review process.

Undisclosed

()

Toronto, Canada
Maybe global
Remote

DevOps Engineer

New
Top rated
Obviant
Full-time
Full-time
Posted

The DevSecOps / Platform Engineer will design, implement, and operate secure, cloud-native infrastructure powering core data and application platforms for a defense-focused company. They will develop CI/CD pipelines, automate deployments, uphold security practices, and collaborate across teams to ensure reliability, scalability, and compliance for government users.

Undisclosed

()

Maybe global
Hybrid

Staff Software Engineer, Infrastructure

New
Top rated
Decagon
Full-time
Full-time
Posted

You will design, build, and operate production infrastructure for high-scale, low-latency systems, owning critical services end-to-end to improve reliability and performance. The role also involves partnering with research and product teams, optimizing service latencies, evolving CI/CD and self-service tooling, and leading infrastructure-as-code and GitOps practices.

Undisclosed
YEAR

(USD)

Maybe global
On-site

Staff Infrastructure Security Engineer

New
Top rated
Crusoe
Full-time
Full-time
Posted

The engineer will architect, deploy, and operationalize foundational security services to support Crusoe's move toward Zero Trust, serving as a technical leader for secrets management and identity architecture. Responsibilities span from driving enterprise-wide platforms like HashiCorp Vault to defining trust patterns and secure onboarding in a hybrid, multi-cloud environment.

Undisclosed

()

Maybe global
On-site

Enterprise Security Engineer

New
Top rated
PhysicsX
Full-time
Full-time
Posted

You will be responsible for building and operationalizing the company's compliance program, implementing controls, and supporting audits in a fast-paced SaaS environment. Key tasks include managing GRC tools, automating workflows for compliance standards such as SOC 2 and ISO 27001, and supporting responses to customer security assessments.

Undisclosed
YEAR

(USD)

Maybe global

Freelance AI Red Team Engineer

New
Top rated
Mindrift
Part-time
Full-time
Posted

As a Freelance AI Red Team Engineer, you will evaluate and red team AI models, agents, and machine learning systems for safety risks and vulnerabilities. You will also develop automation tools, create rigorous test scenarios, and contribute to security research initiatives in the AI domain.

Undisclosed
HOUR

(USD)

Maybe global
Remote Solely

Want to see more AI DevOps Engineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.
(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Have questions about roles, locations, or requirements for AI DevOps Engineer jobs?

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What does an AI DevOps Engineer do?","answer":"AI DevOps Engineers build and maintain ML pipelines in cloud environments, implementing CI/CD workflows specifically for AI applications. They create monitoring solutions that track not just system health but also data quality and model performance. Their daily work includes developing cloud infrastructure code using tools like Terraform and Ansible, ensuring AI applications scale effectively. They collaborate with data scientists to deploy models, troubleshoot production issues, and implement security protocols. Unlike traditional developers, they bridge the gap between data science and operations, ensuring ML models transition smoothly from development to production environments."},{"question":"What skills are required for AI DevOps Engineer jobs?","answer":"AI DevOps Engineers need strong cloud platform expertise, particularly in AWS, Azure, or GCP. Proficiency with infrastructure-as-code tools like Terraform and Ansible is essential. Container orchestration skills using Docker and Kubernetes help manage AI workloads. Experience with CI/CD pipelines through Jenkins or GitLab CI enables automated model deployment. Python scripting ability supports both automation and ML pipeline integration. Monitoring skills using Prometheus and Grafana help track model performance. Beyond technical abilities, these roles require collaboration skills to work effectively with data scientists and developers, plus problem-solving aptitude to troubleshoot complex AI system issues."},{"question":"What qualifications are needed for AI DevOps Engineer jobs?","answer":"Most AI DevOps Engineer positions require a minimum of 3 years of software development experience and 2+ years of cloud deployment experience, with Azure often preferred. A computer science or related degree is typically expected, though equivalent experience may substitute. Employers look for candidates with hands-on experience using development and deployment tools like GitLab and Atlassian suite products. While not always mandatory, certifications in cloud platforms (AWS Solutions Architect, Azure DevOps Engineer) and container orchestration (CKA) strengthen applications. Experience building CI/CD pipelines specifically for ML workflows gives candidates a significant advantage in the hiring process."},{"question":"What is the salary range for AI DevOps Engineer jobs?","answer":"AI DevOps Engineer salaries vary based on several key factors. Geographic location significantly impacts compensation, with tech hubs like San Francisco and New York offering higher wages. Experience level creates substantial differences, with senior engineers earning considerably more. Specialized expertise in high-demand tools like Kubernetes or specific cloud platforms (AWS/Azure/GCP) can boost earnings. Industry sector also matters—financial services and healthcare organizations often pay premium rates for AI infrastructure expertise. Company size influences packages too, with large enterprises typically offering better benefits but startups potentially providing equity. Security clearances for sensitive projects may command additional compensation."},{"question":"How long does it take to get hired as an AI DevOps Engineer?","answer":"The hiring timeline for AI DevOps Engineers typically ranges from 4-8 weeks. The process usually begins with a screening call, followed by technical assessments testing cloud infrastructure skills and coding abilities. Candidates often face 2-3 rounds of interviews, including sessions with engineering managers and team members. Many employers include practical challenges related to containerization, CI/CD pipeline setup, or infrastructure-as-code implementations. Companies hiring for specialized AI infrastructure may extend the process with additional technical evaluations. Candidates with demonstrated experience in both DevOps and machine learning environments generally move through the pipeline faster than those from only traditional DevOps backgrounds."},{"question":"Are AI DevOps Engineer jobs in demand?","answer":"AI DevOps Engineer roles show strong demand as organizations integrate machine learning into their product offerings. Major companies like Boeing actively recruit for these positions to support AI applications in secure cloud environments. The specialized skillset—combining traditional DevOps practices with ML pipeline expertise—creates a smaller talent pool than for general DevOps roles. Organizations increasingly recognize that successful AI deployment requires specialized infrastructure and monitoring beyond conventional applications. This demand spans industries from technology and finance to manufacturing and healthcare, as each sector adopts AI capabilities requiring robust deployment pipelines, monitoring solutions, and infrastructure that traditional DevOps approaches don't fully address."},{"question":"What is the difference between AI DevOps Engineer and Traditional DevOps Engineer?","answer":"Traditional DevOps Engineers focus on application delivery pipelines, infrastructure automation, and system monitoring for conventional software. AI DevOps Engineers extend these skills to handle machine learning workflows, requiring specialized knowledge of model deployment, training pipelines, and experiment tracking. While both roles use similar tools (Docker, Kubernetes, CI/CD platforms), AI DevOps Engineers must understand data quality monitoring and model performance metrics that don't exist in traditional applications. They work more closely with data scientists and ML engineers, bridging the gap between data science and operations. AI DevOps requires additional considerations around computational resources, GPU scheduling, and optimizing infrastructure for machine learning workloads."}]