AI Platform Engineer Jobs

Discover the latest remote and onsite AI Platform Engineer roles across top active AI companies. Updated hourly.

Check out 43 new AI Platform Engineer opportunities posted on AI Chopping Block

Software Engineer, Platform

New
Top rated
Scale AI
Full-time
Full-time
Posted

As a Production AI Ops Lead, you will design and develop the production lifecycle of full-stack AI applications, support end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and resilient cloud infrastructure for international government partners. You will own the production outcome, taking full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies. You will ensure full-stack integrity by overseeing the health of the platform, ensuring seamless integration between the AI core and all full-stack components from APIs to UI. Additionally, you will build automated systems to monitor model performance and data drift across geographically dispersed environments, manage the technical lifecycle within diverse regulatory frameworks, lead the response for production issues in mission-critical environments, translate deep technical performance metrics into clear insights for senior international government officials, and partner with Engineering and ML teams to ensure field lessons influence future technical architecture and decisions.

Undisclosed

()

London, United Kingdom
Maybe global
Onsite

Relocate to SF: Software Engineer (AI Infra)

New
Top rated
Pylon
Full-time
Full-time
Posted

Build the platforms that power Pylon's AI features such as prompt executions and search infrastructure. Improve LLM observability including AI evaluations both online and offline, scorers, and prepare Pylon's AI for future scaling. Enhance the quality and performance of AI features.

$180,000 – $300,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite

Software Engineer, ML Data Infrastructure

New
Top rated
Ideogram
Full-time
Full-time
Posted

The Software Engineer, ML Data Infrastructure will collaborate with engineers to build advanced AI design experiences, tackle complex technical challenges including scaling distributed systems and enabling generative media experiences, build robust data infrastructure at petabyte scale ensuring reliability and performance across multi-modal training pipelines, optimize data processing workflows for high throughput involving distributed systems, TPU infrastructure, and large-scale storage, and partner with research scientists to understand data requirements and translate them into production-grade systems to accelerate model development cycles.

Undisclosed

()

Toronto, Canada
Maybe global
Onsite

Senior Engineering Manager, Management Plane Systems

New
Top rated
Crusoe
Full-time
Full-time
Posted

Lead the team responsible for the automation, observability, configuration management, and policy enforcement layer that runs across the entire network fleet. Own the architecture, development, and production operation of the SDN Management Plane, including the automation and observability platform for managing network fleet across all regions. Build and operate CI/CD pipelines for network configuration, including automated testing, policy validation, and push-on-green delivery of network changes. Design and implement software systems that enforce reconciliation between declared and actual network state, detect configuration drift, and trigger automated remediation workflows. Define provisioning and onboarding automation for new nodes, regions, and customer environments. Drive the design of network observability systems such as streaming telemetry, synthetic probing, anomaly detection, and real-time traffic monitoring across GPU clusters. Design and implement self-healing network capabilities using closed-loop automation to detect, diagnose, and resolve network faults without human intervention. Set the technical vision for applying GenAI and machine learning to network operations. Partner with Control Plane and Data Plane teams to ensure software interfaces between layers and collaborate with infrastructure and compute teams to support GPU cluster networking requirements. Act as internal platform owner for network automation and treat engineering teams as customers with real product requirements. Lead, mentor, and grow a team of senior and staff-level software and network automation engineers, set technical standards, review architecture and design decisions, and own team performance and development. Foster a high-ownership engineering culture focused on shipping production software.

$237,000 – $288,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite

Software Engineer, Monetization ML Infrastructure

New
Top rated
OpenAI
Full-time
Full-time
Posted

Design and build the machine learning infrastructure that powers OpenAI's monetization and ads systems. Develop large-scale data pipelines processing impressions, clicks, conversions, advertiser data, marketplace signals, and other inputs used to train and improve ML models. Create scalable model training platforms for ranking, conversion prediction, quality prediction, bidding, targeting, measurement, and optimization workloads. Develop systems to safely and reliably move models from experimentation into production environments. Build and improve real-time inference and serving infrastructure with strict requirements for latency, throughput, reliability, and availability. Design experimentation frameworks enabling A/B testing, holdouts, model comparisons, ramping strategies, and measurement at scale. Improve platform performance by optimizing training efficiency, inference latency, model throughput, infrastructure reliability, and cost effectiveness. Collaborate closely with ML engineers, product engineers, data scientists, and monetization teams to accelerate development and deployment of advertising systems.

$293,000 – $441,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Remote

Client Engineering Lead

New
Top rated
Invisible Technologies
Full-time
Full-time
Posted

As a Staff/Principal-level Technical Lead, you will be responsible for driving the end-to-end technical execution of multiple concurrent enterprise engagements in close partnership with the Project Lead, from technical discovery to production deployment. You will architect and implement secure, highly scalable integrations between the AI platform and clients' existing data pipelines, APIs, and infrastructure. You will lead technical discovery sessions, architecture workshops, and data readiness assessments with customer IT, data, and engineering leadership teams. You will build and customize AI-enabled solutions, scripts, and workflows that address complex business problems identified in the sales process. You will serve as the primary technical liaison and escalation point between customer engineering teams and internal product, engineering, and data science teams to unblock deployments quickly. You will ensure that all deployed solutions meet enterprise-grade standards for performance, security, data privacy, and scalability. You will debug complex integration issues, manage technical risks across overlapping projects, and provide hands-on troubleshooting during implementation. Additionally, you will contribute to the internal codebase by documenting technical blueprints, developing reusable integration components, and providing product feedback based on real-world edge cases.

$171,000 – $300,000
Undisclosed
YEAR

(USD)

United States
Maybe global
Remote

Senior Software Engineer

New
Top rated
Hiya
Full-time
Full-time
Posted

Own the complete development lifecycle for spam and scam detection infrastructure including research, proposing solutions, implementation, testing, deployment, production maintenance, and monitoring. Participate in on-call rotation for rapid recognition and resolution of production issues while improving system reliability. Design and build frameworks that enable data scientists to develop, test, and deploy complex scam detection models with access to call data in a privacy-aware and regulation-compliant manner. Make independent implementation decisions while driving collaborative design discussions to improve system quality, maintainability, and cost-effectiveness. Evaluate critical tradeoffs between immediate fixes and durable solutions prioritizing service quality and system resilience. Collaborate with product managers, data scientists, and engineering teams to align technical decisions with business impact and user needs. Recognize and promote engineering patterns, design principles, and architectural decisions across teams to raise quality and execution speed. Influence team operations by pushing back on non-aligned solutions, surfacing issues early in project planning, and reasoning about business impact versus cost.

Undisclosed

()

Budapest, Hungary
Maybe global
Hybrid

Staff Software Engineer - Managed Kubernetes

New
Top rated
Lambda AI
Full-time
Full-time
Posted

As a Staff Engineer on the Orchestration team, you will drive the technical vision for Lambda's Managed Kubernetes bare-metal platform, including control plane scalability, multi-tenancy, cluster lifecycle management, and high availability. You will integrate and extend NVIDIA's open-source ecosystem, design GPU-aware orchestration systems, and lead the development of services powering managed services. Your responsibilities include informing and helping with networking solutions such as CNI integration and high-performance fabrics, and informing and helping with storage architecture requirements for AI workloads. You will build the foundation for Managed Slurm on Kubernetes, design higher-level platform services for inference, including model serving infrastructure and autoscaling, and design self-healing systems and automation for incident response and platform resilience. You will lead chaos engineering efforts to validate system behavior under failure conditions, establish operational excellence including upgrade automation and zero-downtime maintenance. Additionally, you will serve as a technical bridge between Orchestration and other infrastructure teams, drive infrastructure-wide decisions, provide input on bare-metal provisioning, network topology, and storage systems, champion consistency and standardization, work directly with customers and internal teams to understand deployments and roadmap managed platforms. Your role includes setting technical direction for Kubernetes services, driving reviews and design sessions, mentoring engineers, collaborating cross-functionally, engaging with NVIDIA and open-source communities, representing Lambda externally, and shaping AIOps vision for automated capacity planning, anomaly detection, and predictive maintenance of cloud infrastructure.

$314,000 – $465,000
Undisclosed
YEAR

(USD)

Bellevue or Seattle, United States
Maybe global
Hybrid

Software Engineer, Productivity - Inference Runtime

New
Top rated
OpenAI
Full-time
Full-time
Posted

The responsibilities include improving systems that ensure inference engine releases are correct, performant, and regression-free by evolving tooling and infrastructure for deploy gate validation; bringing rigor to release, validation, branching, and deployment processes across the inference stack; improving canary, async, and large-scale validation workflows for inference systems; hardening CI, testing, and validation infrastructure so failures are actionable and trustworthy; reducing noisy or flaky failures caused by infrastructure instability, GPU scheduling, or test environment issues; building automation for failure triage, ownership detection, debugging, and escalation; partnering closely with inference teams, research developer productivity, engine acceleration, and infrastructure teams to improve release quality and rollout safety; and reducing developer friction in testing, debugging, and release workflows so engineers can move faster with confidence.

$230,000 – $385,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite

Software Engineer, ML Systems

New
Top rated
Harmonic
Full-time
Full-time
Posted

Build and manage end-to-end machine learning (ML) pipelines including ETL and automated evaluation that support reinforcement learning research. Identify and refactor inefficient research code to enable scalable performance of promising ideas. Establish best practices for versioning, experiment tracking, and continuous integration/continuous deployment (CI/CD) for ML models to ensure reliability. Manage the deployment and scaling of workloads on Kubernetes, and implement tooling and telemetry to monitor agent behavior and training health.

Undisclosed

()

Palo Alto, United States
Maybe global
Onsite

Want to see more AI Platform Engineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.
(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Have questions about roles, locations, or requirements for AI Platform Engineer jobs?

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What does a AI Platform Engineer do?","answer":"AI Platform Engineers develop and maintain infrastructure that supports machine learning workloads. They collaborate with data scientists and software teams to deploy, manage, and optimize AI models while implementing automation for deployment and scaling. Their responsibilities include ensuring high availability, designing scalable data pipelines, integrating models, and resolving platform issues. They also monitor system performance and stay current with advancements in AI infrastructure technologies."},{"question":"What skills are required for AI Platform Engineer?","answer":"Successful AI Platform Engineers need proficiency in cloud platforms (AWS, GCP, Azure), containerization technologies like Kubernetes, and infrastructure automation tools such as Terraform. Programming skills in Python, Java, or R are essential, along with experience in AI frameworks like TensorFlow and PyTorch. Knowledge of CI/CD pipelines, DevOps practices, and data engineering are crucial. Strong problem-solving abilities and collaboration skills are also important for working across technical teams."},{"question":"What qualifications are needed for AI Platform Engineer role?","answer":"Most employers require a bachelor's degree in computer science, software engineering, or related technical field. Typically, companies seek candidates with at least 5 years of experience in DevOps and CI/CD projects. Cloud computing certifications for AWS, GCP, or Azure are highly valued. Demonstrated expertise in machine learning technologies, containerization, infrastructure automation, and security best practices is essential for succeeding in this specialized role."},{"question":"What is the salary range for AI Platform Engineer job?","answer":"The research provided doesn't include specific salary information for AI Platform Engineers. Compensation typically varies based on factors including location, company size, years of experience, and specific technical expertise. Given the specialized nature of this role combining AI, ML, and platform engineering skills, salaries are likely competitive with other advanced technical positions in the AI industry."},{"question":"How long does it take to get hired as a AI Platform Engineer?","answer":"The hiring timeline for AI Platform Engineer positions isn't specified in the research. The process typically involves technical interviews to assess cloud platform knowledge, containerization experience, and AI infrastructure skills. With companies like EY requiring minimum 5 years of DevOps experience, building the necessary qualifications takes significant time. The specialized nature of these roles, requiring both AI and platform engineering expertise, may extend the hiring process."},{"question":"Are AI Platform Engineer job in demand?","answer":"Yes, AI Platform Engineer jobs show strong demand signals. Currently, major organizations like General Motors and Millennium have open positions specifically for AI infrastructure development. As companies increasingly integrate AI into their operations, the need for specialists who can build robust platforms to support machine learning workloads continues to grow. The specialized combination of AI knowledge and platform engineering skills makes these professionals particularly valuable in today's technology landscape."}]