AI Infrastructure Engineer Jobs

Discover the latest remote and onsite AI Infrastructure Engineer roles across top active AI companies. Updated hourly.

Join our AI community Interested in Hiring?

Hiring by

Check out 16 new AI Infrastructure Engineer opportunities posted on AI Chopping Block

View detail

TL, Research Inference

New

Top rated

OpenAI

–

Full-time

–

Posted

Mar 20, 2026 2:51

Design and build high-performance inference runtimes for large-scale AI models focusing on efficiency, reliability, and scalability. Own and optimize core execution paths including model execution, memory management, batching, and scheduling. Develop and improve distributed inference across multiple GPUs with attention to parallelism strategies, communication patterns, and runtime coordination. Implement and optimize inference-critical operators and kernels based on real-world workloads. Partner with research teams to ensure new model architectures are supported accurately and efficiently in inference systems. Diagnose and resolve performance bottlenecks using profiling, benchmarking, and low-level debugging. Contribute to observability, correctness, and reliability of large-scale AI systems.

$380,000 – $555,000

Undisclosed

YEAR

(USD)

San Francisco, United States

Maybe global

Onsite

View detail

Inference Technical Lead, On-Device Transformers

New

Top rated

OpenAI

–

Full-time

–

Posted

Mar 14, 2026 2:42

As a Technical Lead on the Future of Computing Research team, you will evaluate and select silicon platforms such as GPUs, NPUs, and specialized accelerators for on-device and edge deployment of OpenAI models. You will work closely with research teams to co-design model architectures that meet real-world deployment constraints including latency, memory, power, and bandwidth. You will analyze and model system performance, identifying tradeoffs between model design, memory hierarchy, compute throughput, and hardware capabilities. You will partner with hardware vendors and internal infrastructure teams to bring up new accelerators and ensure efficient execution of transformer workloads. Additionally, you will build and lead a team of engineers responsible for implementing the low-level inference stack, including kernel development and runtime systems. You will also take nascent research capabilities and develop them into usable capabilities.

$445,000 – $445,000

Undisclosed

YEAR

(USD)

San Francisco, United States

Maybe global

Hybrid

View detail

AI & IT Systems Engineer

New

Top rated

Jasper

–

Full-time

–

Posted

Mar 13, 2026 3:21

As Jasper undergoes an agentic AI shift, the AI & IT Systems Engineer role involves ensuring the IT infrastructure is robust, secure, and fine-tuned for advanced AI workflows, spending 70-80% of time on AI enablement deployments. Responsibilities include modernizing and improving IT systems to support autonomous AI workflows, building scalable automation infrastructure to enhance efficiency and reduce manual tasks, and operationalizing AI initiatives using tools like Claude, ChatGPT, and Zapier to create intelligent, cross-platform workflows involving platforms like Google Workspace and Slack. The role also requires managing core IT systems such as Identity Providers and Mobile Device Management, streamlining identity and access operations using features like Okta Workflows, and providing cross-functional technical support across departments to implement AI enablement projects. Additionally, the engineer manages a broad SaaS ecosystem, including Google Workspace and Linear, and assists in developing training resources and playbooks to facilitate team adoption of new AI tools.

$135,000 – $155,000

Undisclosed

YEAR

(USD)

United States

Maybe global

Remote

View detail

Customer Support Engineer (Inference), India

New

Top rated

Together AI

–

Full-time

–

Posted

Mar 11, 2026 3:37

Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference. Implement and maintain changes in high-performance inference engines, including kernel backends, speculative decoding, and quantization. Profile and optimize performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Design and operate RL and post-training pipelines, jointly optimizing algorithms and systems to make inference and post-training workloads more efficient. Train, evaluate, and iterate on frontier models using these pipelines. Co-design algorithms and infrastructure for tightly coupled objectives, rollout collection, and evaluation to efficient inference. Identify bottlenecks across training engine, inference engine, data pipeline, and user-facing layers. Run ablations and scale-up experiments to understand trade-offs between model quality, latency, throughput, and cost, feeding insights back into model, RL, and system design. Profile, debug, and optimize inference and post-training services under real production workloads. Drive roadmap items requiring engine modification such as changing kernels, memory layouts, scheduling logic, and APIs. Establish metrics, benchmarks, and experimentation frameworks to rigorously validate improvements. Provide technical leadership by setting technical direction for cross-team efforts at the intersection of inference, RL, and post-training, and mentoring other engineers and researchers on full-stack ML systems work and performance engineering.

$200,000 – $280,000

Undisclosed

YEAR

(USD)

India

Maybe global

Onsite

View detail

Helix AI Engineer, Agentic Systems

New

Top rated

Figure AI

–

Full-time

–

Posted

Mar 3, 2026 3:28

Design, deploy, and maintain Figure's training clusters. Architect and maintain scalable deep learning frameworks for training on massive robot datasets. Work together with AI researchers to implement training of new model architectures at a large scale. Implement distributed training and parallelization strategies to reduce model development cycles. Implement tooling for data processing, model experimentation, and continuous integration.

$150,000 – $350,000

Undisclosed

YEAR

(USD)

San Jose, United States

Maybe global

Onsite

View detail

Manual Quality Assurance Engineer, Web Core Product

New

Top rated

Speechify

–

Full-time

–

Posted

Feb 16, 2026 16:10

Work alongside machine learning researchers, engineers, and product managers to bring AI Voices to customers for diverse use cases. Deploy and operate the core ML inference workloads for the AI Voices serving pipeline. Introduce new techniques, tools, and architecture that improve performance, latency, throughput, and efficiency of deployed models. Build tools to identify bottlenecks and sources of instability and design and implement solutions to address the highest priority issues.

$140,000 – $200,000

Undisclosed

YEAR

(USD)

Maybe global

Remote

View detail

AI Infrastructure Engineer

New

Top rated

42dot

–

Full-time

–

Posted

Feb 9, 2026 15:13

Operate and maintain a large-scale GPU cluster consisting of thousands of GPUs across multiple data centers using Kubernetes and Slurm. Monitor and diagnose failures across the GPU hardware and software stacks to ensure high availability and rapid recovery. Develop automation tools and scripts using Python or Shell to streamline repetitive infrastructure management tasks and improve operational efficiency. Manage GPU resource quotas and provide technical support to ML researchers to ensure optimal utilization of computing resources. Participate in the architectural design and performance tuning of distributed training environments for large-scale autonomous driving models.

Undisclosed

()

Pangyo, South Korea

Maybe global

Remote

View detail

Director of Governance & Risk Compliance

New

Top rated

Scale AI

–

Full-time

–

Posted

Feb 3, 2026 2:28

The role involves designing and developing the production lifecycle of full-stack AI applications and supporting end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and resilient cloud infrastructure for international government partners. Responsibilities include taking full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies, overseeing the end-to-end health of the platform ensuring seamless integration between AI core and full-stack components, building automated systems to monitor model performance and data drift across dispersed environments, managing the technical lifecycle within diverse regulatory frameworks, leading incident response for production issues in mission-critical environments, translating technical performance metrics into clear insights for senior government officials, and partnering with Engineering and ML teams to drive product evolution based on field lessons.

Undisclosed

()

San Francisco, United States

Maybe global

Onsite

View detail

Staff Software Engineer, GPU Infrastructure (HPC)

New

Top rated

Cohere

–

Full-time

–

Posted

Jan 16, 2026 2:44

As a Staff Software Engineer, you will build and scale ML-optimized HPC infrastructure by deploying and managing Kubernetes-based GPU/TPU superclusters across multiple clouds ensuring high throughput and low-latency performance for AI workloads. You will optimize for AI/ML training by collaborating with cloud providers to fine-tune infrastructure for cost efficiency, reliability, and performance, using technologies like RDMA, NCCL, and high-speed interconnects. You will troubleshoot and resolve complex issues by identifying and resolving infrastructure bottlenecks, performance degradation, and system failures to minimize disruption to AI/ML workflows. You will enable researchers with self-service tools by designing intuitive interfaces and workflows that allow researchers to monitor, debug, and optimize their training jobs independently. You will drive innovation in ML infrastructure by working closely with AI researchers to understand emerging needs such as JAX, PyTorch, and distributed training and translating them into robust, scalable infrastructure solutions. You will champion best practices by advocating for observability, automation, and infrastructure-as-code (IaC) across the organization to ensure systems are maintainable and resilient. Additionally, you will provide mentorship and collaborate through code reviews, documentation, and cross-team efforts to foster a culture of knowledge transfer and engineering excellence.

Undisclosed

()

Canada

Maybe global

Remote

View detail

Mechanical Engineer, Packaging Systems

New

Top rated

Figure AI

–

Full-time

–

Posted

Dec 19, 2025 7:42

$150,000 – $350,000 / year

Undisclosed

YEAR

(USD)

San Jose, United States

Maybe global

Onsite

Want to see more AI Infrastructure Engineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.

Join our community

(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Have questions about roles, locations, or requirements for AI Infrastructure Engineer jobs?

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What does a AI Infrastructure Engineer do?","answer":"AI Infrastructure Engineers design and build the systems that power machine learning workloads. They optimize performance by resolving bottlenecks, implement scaling solutions through load balancing and redundancy, and deploy cloud infrastructure specifically for AI applications. These specialists build fault-tolerant systems for serving large language models, maintain continuous integration pipelines, and collaborate with AI teams to translate research needs into production-ready infrastructure."},{"question":"What skills are required for AI Infrastructure Engineer?","answer":"Key skills for this role include proficiency with cloud platforms (AWS SageMaker, Azure ML, Vertex AI), infrastructure as code tools like Terraform, and containerization technologies such as Docker and Kubernetes. Strong programming abilities in Python, Go or C++ are essential, with CUDA knowledge for GPU optimization. Experience with monitoring tools (Prometheus, Grafana), distributed systems, deep learning frameworks, and Linux/UNIX environments is highly valued in candidates."},{"question":"What qualifications are needed for AI Infrastructure Engineer role?","answer":"Employers typically require a bachelor's degree in Computer Science, AI, Machine Learning, or related technical field. Most positions demand 4+ years of experience in cloud infrastructure, large-scale systems, or software engineering with an infrastructure focus. Practical expertise in cloud computing, Linux administration, network architecture, and container technologies is essential. Specialized knowledge in GPU programming, distributed systems, and LLM serving capabilities strengthens applications considerably."},{"question":"What is the salary range for AI Infrastructure Engineer job?","answer":"The research provided doesn't contain specific salary information for AI Infrastructure Engineers. Compensation typically varies based on location, experience level, company size, and the specific technical skills required. As this role combines specialized AI knowledge with infrastructure expertise, salaries generally reflect the high demand for professionals who can effectively build and optimize systems for machine learning workloads at scale."},{"question":"How long does it take to get hired as a AI Infrastructure Engineer?","answer":"The research doesn't provide specific hiring timeline information. The hiring process length varies by company and often includes technical assessments of cloud architecture knowledge, infrastructure as code experience, and machine learning operations skills. Given the specialized nature of AI infrastructure roles and their typical requirement of 4+ years of relevant experience, candidates should expect thorough evaluation of their technical capabilities and problem-solving abilities."},{"question":"Are AI Infrastructure Engineer job in demand?","answer":"Yes, AI Infrastructure Engineer positions show strong demand signals. Major companies like Accenture, Scale AI, and Zoom are actively recruiting for these specialized roles. The increasing deployment of large language models and AI applications across industries creates consistent need for professionals who can build optimized infrastructure. The specialized skill intersection of cloud platforms, containerization, GPU optimization, and machine learning operations makes qualified candidates particularly valuable in today's job market."}]

Find AI jobs in by countries

AI Jobs in Argentina

AI Jobs in Australia

AI Jobs in Brazil

AI Jobs in Canada

AI Jobs in China

AI Jobs in France

AI Jobs in Germany

AI Jobs in Hong Kong

AI Jobs in India

AI Jobs in Japan

AI Jobs in Mexico

AI Jobs in Poland

AI Jobs in Singapore

AI Jobs in South Korea

AI Jobs in Spain

AI Jobs in Sweden

AI Jobs in Taiwan

AI Jobs in United Kingdom

AI Jobs in United States

Find AI jobs for similar categories

AI Technical Trainer Jobs

Applied ML Engineer Jobs

AI Full-Stack Engineer Jobs

AI MLOps Engineer Jobs

AI Systems Engineer Jobs

AI Applied Data Scientist Jobs

AI Robotics Software Engineer Jobs

AI Developer Educator Jobs

Applied AI Engineer Jobs

AI Applied Research Scientist Jobs

ML Researcher Jobs

AI Backend Engineer Jobs

AI Training Data Specialist Jobs

ML Research Engineer Jobs

ML Infrastructure Engineer Jobs

AI Autonomous Systems Engineer Jobs

AI Platform Engineer Jobs

AI Perception Engineer Jobs

AI Robotics Researcher Jobs