AI MLOps Engineer Jobs

Discover the latest remote and onsite AI MLOps Engineer roles across top active AI companies. Updated hourly.

Check out 12 new AI MLOps Engineer opportunities posted on The Homebase

Copy of Member of Technical Staff - ML Engineering

New
Top rated
Talent Labs
Full-time
Full-time
Posted

Deploy, maintain, and optimize production and research compute clusters. Design and implement scalable and efficient ML inference solutions. Develop dynamic and heterogeneous compute solutions for balancing research and production needs. Contribute to productizing model APIs for external use. Develop infrastructure observability and monitoring solutions.

Undisclosed

()

London, United Kingdom
Maybe global
Remote

AI Evaluation Engineer

New
Top rated
Ryz Labs
Contractor
Full-time
Posted

Design and implement evaluation pipelines to measure the performance and reliability of AI models, develop automated testing frameworks to assess model outputs at scale, analyze model performance using both traditional statistical metrics and AI-specific evaluation methods, evaluate AI systems built on modern architectures such as LLM-based applications and Retrieval-Augmented Generation (RAG), identify potential issues related to accuracy, hallucinations, bias, safety, and model drift, conduct adversarial testing to uncover vulnerabilities and ensure safe model behavior, collaborate with engineering and AI teams to improve prompt design, model outputs, and system performance, monitor model performance in production, and help define best practices for AI evaluation and observability.

Undisclosed

()

Argentina
Maybe global
Remote

Engineering Manager, Active Learning

New
Top rated
Deepgram
Full-time
Full-time
Posted

The Engineering Manager role at Deepgram involves leading the design and implementation of internal data and ML training systems. Responsibilities include recruiting, hiring, training, and supporting top engineering talent to build a world-class team; transforming cross-functional visions into detailed project plans with clarity on commitments, risks, and timelines; defining and owning technical strategy to accelerate ML training pipelines; promoting a strong team engineering culture focused on rigorous engineering standards and continuous improvement; partnering with DataOps and Research teams to design and implement new services, features, or products end to end; and coaching and mentoring engineers to support personal growth while achieving ambitious team goals.

$180,000 – $220,000
Undisclosed
YEAR

(USD)

United States
Maybe global
Remote

Senior Machine Learning Engineer

New
Top rated
Replicant
Full-time
Full-time
Posted

Lead the exploration and application of Large Language Models and Generative AI, focusing on new areas within these fields. Translate the latest research into high-performing systems and models that can enhance user experiences. Help set the team's strategic direction, fostering an environment that encourages innovation and professional growth. Actively engage in all aspects of development including ideation, experimentation, implementation, and deployment. Collaborate with various teams and product managers to develop and implement ML-based solutions, ensuring performance optimization and alignment with broader business goals.

$170,000 – $210,000
Undisclosed
YEAR

(USD)

United States
Maybe global
Remote

Member of Technical Staff - Research Software Engineer

New
Top rated
Reflection
Full-time
Full-time
Posted

The role involves bridging the gap between research and production by transforming cutting-edge algorithms into scalable training systems. Responsibilities include designing and optimizing large-scale training loops and data pipelines, implementing state-of-the-art techniques ensuring numerical stability and computational efficiency, building internal tooling for launching, monitoring, and reproducing complex experiments, diagnosing deep bottlenecks across the training stack such as GPU memory issues, communication overhead, and dataloader stalls, and translating research prototypes into reusable, production-grade infrastructure. The engineer will architect and optimize the core training infrastructure including RL training loops, distributed GPU systems, and large-scale data pipelines, working closely with researchers to build reliable, scalable systems.

Undisclosed

()

New York City, United States
Maybe global
Onsite

Senior Brand Events Manager

New
Top rated
Grammarly
Full-time
Full-time
Posted

Own the observability and lifecycle management of AI features across the organization. Build tools and infrastructure to enable teams to develop, monitor, and optimize LLM-powered features. Design and implement closed-loop evaluation pipelines that automatically validate prompt changes. Develop comprehensive metrics and dashboards to track LLM usage: cost per feature, token patterns, and latency. Create systems that tie user feedback to specific prompts and LLM calls. Establish best practices and processes for the full lifecycle of prompts: development, testing, deployment, and monitoring. Collaborate with engineering teams across the organization to ensure they have the tools and visibility needed to build high-quality AI features.

$103,000 – $174,000
Undisclosed
YEAR

(USD)

United States
Maybe global
Onsite

Principal AI Ops Architect, IPS

New
Top rated
Scale AI
Full-time
Full-time
Posted

As a Production AI Ops Lead, you will design and develop the production lifecycle of full-stack AI applications, support end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and resilient cloud infrastructure for international government partners. You will take full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies. You will oversee the end-to-end health of the platform ensuring seamless integration between the AI core and all full-stack components, from APIs to UI, to maintain a responsive and production-ready environment. Build automated systems to monitor model performance and data drift across geographically dispersed environments ensuring reliability. Manage the technical lifecycle within diverse regulatory frameworks. Lead the response for production issues in mission-critical environments, ensuring rapid resolution and building guardrails to prevent recurrence. Translate deep technical performance metrics into clear insights for senior international government officials and partner with Engineering and ML teams to ensure field lessons influence the technical architecture and future use cases.

Undisclosed

()

Doha or London, Qatar or United Kingdom
Maybe global
Onsite

Senior Pathologist

New
Top rated
PathAI
Full-time
Full-time
Posted

Lead the team responsible for the infrastructure supporting AI/ML Stack, focusing on scalability and efficiency of the Machine Learning Operations platform. Develop and execute the long-term vision and roadmap for the MLOps team to support ML development and deployment across business units, balancing short-term tactical deliveries with long-term architectural transformation. Manage and mentor a team of 6-7+ engineers, allocating resources strategically to support existing services and execute key strategic initiatives. Collaborate cross-functionally with leaders in machine learning, data science, product engineering, and infrastructure to identify pain points, remove bottlenecks, and facilitate new solution deployment. Architect compute and storage pipelines for ML Engineers to manage large datasets and artifacts efficiently. Modernize the AI product inference stack for significant growth in global deployments. Work with Site Reliability Engineering to establish comprehensive system observability metrics. Conduct assessments for technology refresh and benchmark proprietary tools against commercial and open-source alternatives to meet future needs.

$181,500 – $278,300
Undisclosed
YEAR

(USD)

Boston or Memphis, United States
Maybe global
Hybrid

Machine Learning Operations Engineer

New
Top rated
Haydenai
Full-time
Full-time
Posted

Optimize orchestration processes to ensure efficient deployment and management of AI models. Implement cost-saving strategies to minimize infrastructure expenses while maximizing performance. Upgrade throughput to enhance scalability and responsiveness of AI systems. Collaborate with cross-functional teams to identify bottlenecks and implement solutions to improve workflow efficiency. Ship new features and updates rapidly while maintaining high levels of quality and reliability. Deploy and monitor machine learning models produced by deep learning engineers. Design, deploy, and maintain performant and scalable processes for data acquisition and manipulation to enhance dataset accessibility. Participate actively in the team's software development process, including design reviews, code reviews, and brainstorming sessions. Maintain accurate and updated software development documentation.

$135,699 – $190,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Remote

AI Software Engineer (Model Training)

New
Top rated
Maincode
Full-time
Full-time
Posted

Build and maintain systems that support large scale model training including designing and maintaining distributed training pipelines for large language models, building data ingestion and preprocessing systems for large training datasets, developing tooling for experiment management, checkpointing, and reproducibility, monitoring and debugging long running training jobs across clusters, improving reliability and observability across the training stack, optimizing training throughput across compute, memory, and data pipelines, working closely with researchers to translate experimental ideas into training runs, and diagnosing failures across infrastructure, training loops, and data pipelines. The work involves working inside code, logs, dashboards, and experiment outputs to make large scale training reliable.

Undisclosed

()

Melbourne, Australia
Maybe global
Onsite

Want to see more AI MLOps Engineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.
(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Have questions about roles, locations, or requirements for AI MLOps Engineer jobs?

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What does a AI MLOps Engineer do?","answer":"AI MLOps Engineers design and implement CI/CD pipelines for machine learning models, focusing on deployment, monitoring, and maintenance. They containerize models using Docker and Kubernetes, implement automated testing frameworks, and build scalable infrastructure for ML workflows. These engineers monitor models for performance degradation and data drift while ensuring security compliance throughout the pipeline. They bridge the gap between data science and production environments, automating model versioning, retraining, and optimization."},{"question":"What skills are required for AI MLOps Engineer?","answer":"AI MLOps Engineers need strong programming skills in Python and experience with containerization tools like Docker and Kubernetes. Proficiency with cloud platforms (AWS, GCP, Azure) is essential, alongside expertise in CI/CD pipelines, version control, and infrastructure as code. They should understand ML algorithms, model serving patterns, and monitoring systems to track performance metrics. Experience with vector databases, RAG systems, and fine-tuning pipelines for LLMs is increasingly valuable in today's market."},{"question":"What qualifications are needed for AI MLOps Engineer role?","answer":"Most AI MLOps Engineer positions require a bachelor's degree in Computer Science, Data Science, Engineering or related field. Employers typically seek candidates with 4+ years of technical engineering experience, particularly in DevOps, software engineering, or data engineering. Demonstrable expertise with ML deployment, containerization, and cloud platforms is crucial. Strong coding skills in Python and other languages, combined with practical experience implementing and maintaining ML systems in production environments, are highly valued."},{"question":"What is the salary range for AI MLOps Engineer job?","answer":"The research provided does not contain specific salary information for AI MLOps Engineers. Compensation typically varies based on location, experience level, company size, and industry. As this role requires specialized expertise in both ML and DevOps, salaries generally align with other senior technical positions in the AI field. For accurate salary information, it's recommended to consult current compensation surveys or job listings for AI MLOps Engineer positions in your target location."},{"question":"How long does it take to get hired as a AI MLOps Engineer?","answer":"The research doesn't provide specific hiring timelines for AI MLOps Engineer positions. The process typically involves technical interviews assessing both ML knowledge and operational skills. With employers commonly requiring 4+ years of technical experience and specific expertise in ML algorithms, DevOps, and workflow automation, candidates meeting these qualifications may move through the process more quickly. The hiring timeline can vary significantly depending on the company's urgency, the candidate pool, and the specific technical requirements of the position."},{"question":"Are AI MLOps Engineer job in demand?","answer":"The research indicates growing demand for AI MLOps Engineers, evidenced by recruitment at major companies like Microsoft. As organizations increasingly deploy ML models to production, the need for specialists who can bridge data science and operations has expanded. This role is crucial for companies looking to scale AI initiatives reliably and efficiently. The specialized skill set combining ML knowledge with DevOps expertise makes qualified candidates particularly valuable as more businesses implement machine learning in production environments."}]