AI Systems Engineer Jobs

Discover the latest remote and onsite AI Systems Engineer roles across top active AI companies. Updated hourly.

Check out 35 new AI Systems Engineer opportunities posted on The Homebase

Member of Technical Staff, Pre-training Systems

New
Top rated
Magic
Full-time
Full-time
Posted

Design and operate the distributed infrastructure that trains Magic's long-context models at scale, focusing on large-scale model training across massive GPU clusters. Scale distributed training across large GPU clusters including data, tensor, and pipeline parallelism. Optimize communication patterns and gradient synchronization. Improve checkpointing, fault tolerance, and job recovery systems. Profile and eliminate performance bottlenecks across compute, networking, and storage. Improve experiment reproducibility and orchestration workflows. Increase hardware utilization and training throughput. Collaborate with Kernels and Research to align model architecture with systems realities.

$225,000 – $550,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite

Member of Technical Staff, Inference & RL Systems

New
Top rated
Magic
Full-time
Full-time
Posted

Design and operate distributed systems that serve models in production and power large-scale post-training workflows. Work on systems impacting inference latency, throughput, stability, and reliability of reinforcement learning (RL) and post-training training loops. Own infrastructure for production inference and large-scale RL iteration to handle KV-cache scaling, memory pressure with long sequences, batching trade-offs, long-horizon trajectory rollouts, and sustained throughput. Design and scale high-performance inference serving systems, optimize KV-cache management, batching strategies, and scheduling, improve throughput and latency for long-context workloads, build and maintain distributed RL and post-training infrastructure, improve reliability of rollout, evaluation, and reward pipelines, automate fault detection and recovery for serving and RL systems, profile and eliminate performance bottlenecks across GPU, networking, and storage layers, and collaborate with Kernels and Research teams to align execution systems with model architecture.

$225,000 – $550,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite

Member of Technical Staff, Tech Lead

New
Top rated
Listen Labs
Full-time
Full-time
Posted

The role involves tackling complex problems end-to-end with ownership of parts of the product, making decisions across the LLM pipeline, infrastructure, backend, and UX. The candidate is expected to define the architecture for years to come by making critical decisions on a greenfield stack. The work includes pushing the most advanced AI models to their limits, communicating tradeoffs, problems, and blockers directly, and building a product that works with attention to detail. The job focuses on developing AI-powered research capabilities such as building a research agent, creating a database of millions of humans for precise targeting, advancing realtime video interviews with emotional understanding, building a distributed information mining agent, and developing a customer preference model with synthetic personas to extrapolate new insights.

$150,000 – $300,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite

Staff Software Engineer - Product Fundamentals

New
Top rated
Multiverse
Full-time
Full-time
Posted

The Staff Software Engineer in the Product Fundamentals Group at Multiverse is responsible for architecting systems that allow deployment of AI-powered features at scale, acting as a force multiplier guiding direction across multiple teams, and owning complex cross-functional problems that ensure stability, security, and architectural integrity as AI capabilities scale. The role involves auditing and aligning technical direction with business objectives, leading highly complex AI-related projects with a focus on predictable delivery, operational excellence, and impactful user experience, and defining frameworks to guide architectural strategy across teams while coaching others. The engineer solves ambiguous engineering challenges, makes critical decisions thoughtfully and decisively, drives architectural strategy for major platforms to ensure AI systems are reliable and performant, coordinates broader initiatives spanning multiple work-streams to adopt technical debt and scalability strategies, and innovates by leveraging emerging AI technologies to solve complex problems and build foundational components for the engineering organization.

Undisclosed

()

London, United Kingdom
Maybe global
Remote

Software Engineer, AI Compiler

New
Top rated
Normal Computing
Full-time
Full-time
Posted

Work across the full stack with software, systems, and hardware teams to ensure correctness, performance, and deployment readiness for real workloads. Contribute to shaping the long-term compiler architecture and tooling strategy. Design and implement parts of the compiler stack targeting the novel AI accelerator, including front-end lowering, IR transformations, optimization passes, and backend code generation. Build and evolve MLIR/LLVM based infrastructure to support graph lowering, hardware-aware optimizations, and performance-centric code emission. Collaborate closely with hardware architects, microarchitects, and research teams to co-design compiler strategies that align with evolving ISA and hardware constraints. Develop profiling and analysis tools to identify performance bottlenecks, validate generated code, and ensure high throughput/low latency execution of AI workloads. Enable efficient mapping of high-level ML models to hardware by working with model frameworks and graph representations such as ONNX, JAX, and PyTorch. Drive performance tuning strategies including kernel authoring, schedule generation, and hardware-specific optimization passes.

$190,000 – $215,000
Undisclosed
YEAR

(USD)

New York City, United States
Maybe global
Onsite

Software Intern

New
Top rated
TensorWave
Intern
Full-time
Posted

As a Software Engineering Intern at TensorWave, responsibilities include collaborating with senior engineers on features for cloud control plane, orchestration layer, user-facing APIs, or internal tooling; working on automation, monitoring, and observability for GPU clusters (Slurm + Kubernetes-native environments); participating in debugging performance bottlenecks in high-throughput inference or distributed training pipelines; writing clean, well-tested code and participating in code reviews; and learning how bare-metal AI clouds operate at scale, including hardware partitioning, high-speed networking, and storage.

Undisclosed

()

Las Vegas, United States
Maybe global
Onsite

Senior Backend / Systems Engineer (AI) - San Mateo, CA

New
Top rated
Trustlab
Full-time
Full-time
Posted

Design and build extensible backend systems that support flexible configurations for different customers and content types. Develop infrastructure that interfaces cleanly with large language models (LLMs), enabling prompt engineering, context injection, and modular evaluation workflows. Build tooling and platforms that enable fast iteration by AI engineers and analysts, including declarative pipelines, parameterized jobs, and reproducible experiments. Prioritize ease of deployment, integration, and testing, both for internal teams and external partners. Collaborate closely with product, data, and policy teams to translate nuanced safety needs into scalable, maintainable software systems.

$150,000 – $220,000
Undisclosed
YEAR

(USD)

San Mateo, United States
Maybe global
Remote

Software Engineer - Sensing, Consumer Products

New
Top rated
OpenAI
Full-time
Full-time
Posted

As a Software Engineer on Consumer Products Research, the responsibilities include building and shipping production software for sensing algorithms by translating algorithm prototypes into reliable end-to-end systems, implementing and owning key parts of the Python shipping pipeline including integration surfaces, evaluation hooks, and quality/performance guardrails. The role also involves developing embedded/on-device software in an RTOS environment (such as Zephyr) and deploying models to device runtimes and hardware accelerators. Additional responsibilities include optimizing real-time on-device perception loops for stability, latency, power, and memory constraints, creating data collection and instrumentation tooling to bring up new sensing modalities and accelerate iteration from prototype to dataset to model to device, and partnering cross-functionally with algorithms, human data, firmware/hardware teams to debug, profile, and harden systems against real-world variability.

$325,000 – $325,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Hybrid

Senior Software Engineer, ML Core

New
Top rated
Zoox
Full-time
Full-time
Posted

Design, develop, and deploy custom and off-the-shelf ML libraries and toolings to improve ML development, training, deployment, and on-vehicle model inference latency. Build tooling and establish development best practices to manage and upgrade foundational libraries such as Nvidia driver, PyTorch, TensorRT, to improve ML developer experience and expedite debugging efforts. Collaborate closely with cross-functional teams including applied ML research, high-performance compute, advanced hardware engineering, and data science to define requirements and align on architectural decisions. Work across multiple ML teams within Zoox, supporting in- and off-vehicle ML use cases and coordinating to meet the needs of vehicle and ML teams to reduce the time from ideation to productionization of AI innovations.

$214,000 – $290,000
Undisclosed
YEAR

(USD)

Foster City, United States
Maybe global
Onsite

Software Engineer, Platform Systems

New
Top rated
OpenAI
Full-time
Full-time
Posted

Design and build distributed failure detection, tracing, and profiling systems for large-scale AI training jobs. Develop tooling to identify slow, faulty, or misbehaving nodes and provide actionable visibility into system behavior. Improve observability, reliability, and performance across OpenAI's training platform. Debug and resolve issues in complex, high-throughput distributed systems. Collaborate with systems, infrastructure, and research teams to evolve platform capabilities. Extend and adapt failure detection systems or tracing systems to support new training paradigms and workloads.

Undisclosed

()

London, United Kingdom
Maybe global
Onsite

Want to see more AI Systems Engineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.
(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Have questions about roles, locations, or requirements for AI Systems Engineer jobs?

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What does a AI Systems Engineer do?","answer":"AI Systems Engineers design architectures that enable AI models to interact with various system components. They develop, test, and deploy machine learning models while integrating AI modules with existing software, hardware, and business systems. Their responsibilities include monitoring system performance, maintaining and retraining models, managing data flow, ensuring security, addressing ethical concerns, and collaborating with cross-functional teams including data scientists and stakeholders."},{"question":"What skills are required for AI Systems Engineer?","answer":"AI Systems Engineers need strong programming skills in Python, Java, or R, and proficiency with AI frameworks like TensorFlow and PyTorch. Experience with machine learning, deep learning, NLP, and computer vision is essential. They must understand MLOps, infrastructure management, and cloud computing platforms. Problem-solving abilities, communication skills, and team collaboration are equally important for successfully integrating AI systems across organizations."},{"question":"What qualifications are needed for AI Systems Engineer role?","answer":"Most employers require a degree in Computer Science, Engineering, or a related technical field for AI Systems Engineer positions. Practical experience with machine learning algorithms, AI frameworks, and systems integration is highly valued. While formal education provides a foundation, demonstrated ability in designing AI architectures, implementing ML models, and managing complex systems is often more important than specific credentials."},{"question":"What is the salary range for AI Systems Engineer job?","answer":"The research provided doesn't specify exact salary figures for AI Systems Engineers. Compensation typically reflects the specialized nature of the role, requiring both AI expertise and systems engineering knowledge. Salaries vary based on factors like location, company size, experience level, and specific technical skills. AI jobs generally command premium compensation due to high demand for professionals who can build and integrate intelligent systems."},{"question":"How long does it take to get hired as a AI Systems Engineer?","answer":"The hiring timeline for AI Systems Engineers varies significantly depending on factors not specified in the research. The process typically involves technical assessments to evaluate machine learning knowledge and systems design capabilities. Companies may conduct multiple interview rounds to assess both technical expertise and cross-functional collaboration abilities. The specialized nature of these roles can extend the hiring process as organizations seek candidates with the right combination of AI and systems engineering skills."},{"question":"Are AI Systems Engineer job in demand?","answer":"Yes, AI Systems Engineer jobs are experiencing growing demand as businesses integrate advanced AI technologies into their operations. The rise of generative AI and other innovations is driving need for specialists who can design complete AI-enabled architectures. Organizations increasingly require professionals who can bridge the gap between AI development and systems engineering, managing both the technical implementation and broader infrastructure considerations of artificial intelligence deployments."}]