Software Engineer - C++ GPU Performance
As a GPU performance software engineer within the Software Performance team, you will instrument, monitor, analyze and optimize GPU-based algorithms that are performance-critical for the Zoox self-driving solution. You will build real-time instrumentation for performance monitoring of CPU, GPU, latency, and memory, and develop offline benchmarking frameworks, tools, and scripts to evaluate and analyze performance at scale in continuous integration and vehicle environments. You will establish performance budgets for next-generation architectures, analyze performance metrics to identify GPU hotspots and root causes, and propose and co-implement actionable solutions with component teams. You will support teams in bringing serial algorithms to the GPU to maximize compute utilization and improve overall latency. Additionally, you will work as part of the Core team to design a middleware framework that promotes efficient and performant code development by maximizing CPU and GPU usage by default.
Senior Software Engineer - C++ GPU Performance
As a GPU performance software engineer within the Software Performance team at Zoox, responsibilities include instrumenting, monitoring, analyzing, and optimizing GPU-based algorithms that are performance-critical. This includes building real-time instrumentation for performance monitoring (CPU, GPU, latency, memory) and developing offline benchmarking frameworks, tools, and scripts to evaluate and analyze performance at scale in CI/vehicle environments and establish budgets for next-generation architectures. Analyze performance metrics to identify GPU hotspots and root causes, proposing and co-implementing actionable solutions with component teams. Support teams in bringing serial algorithms to the GPU to maximize compute utilization and improve overall latency. Work as part of the Core team to design a middleware framework that promotes efficient and performant code development by maximizing CPU and GPU use.
Staff Software Engineer, CAPE
The Staff Software Engineer is responsible for architecting, designing, and developing the intelligence layer that controls GPU node assignment, monetization, and management within Crusoe's fleet. They will be among the first engineers on the Virtual Pool Service and Capacity Management Intelligence systems, shaping implementation and making key design decisions, building foundational cloud platform infrastructure. Responsibilities include building the Virtual Pool Service as the source of truth for GPU node states and history, designing the Capacity Management Intelligence automation layer to handle allocation and automated node lifecycle transitions, collaborating cross-functionally to architect and implement physical infrastructure management systems, championing system reliability, scalability, and security, streamlining cloud deployment and operations using technologies such as Go, gRPC, NATS event streaming, PostgreSQL on Kubernetes, and Netbox, and mentoring engineers while contributing to team growth in collaboration with engineering management.
Senior Backend / Systems Engineer (AI) - San Mateo, CA
Design and build extensible backend systems supporting flexible configurations for different customers and content types. Develop infrastructure interfacing with LLMs to enable prompt engineering, context injection, and modular evaluation workflows. Build tooling and platforms for fast iteration by AI engineers and analysts, including declarative pipelines, parameterized jobs, and reproducible experiments. Prioritize ease of deployment, integration, and testing for internal teams and external partners. Collaborate closely with product, data, and policy teams to translate nuanced safety needs into scalable, maintainable software systems.
Staff Software Engineer, Foundations (Managed AI)
As a Staff Software Engineer in the Foundations department, responsibilities include leading the design and implementation of highly scalable systems for the Managed AI offerings, driving the long-term technical roadmap for the Foundations team to support growth and evolving AI workloads, working cross-functionally with Cloud Engineering to align technical goals and solve integration challenges, leading by example through high-quality code contributions and mentoring Senior and Staff-level engineers, championing reliability, observability, and performance by identifying and resolving systemic bottlenecks, and staying current with AI infrastructure trends to ensure efficient and powerful tools are utilized.
Director, Engineering, Proactive Offense
Lead and scale Horizon3.ai's Offensive Engineering organization, overseeing teams responsible for exploit development, offensive content, and attack automation within the NodeZero platform. Set clear technical and product direction for how NodeZero identifies, exploits, and validates vulnerabilities across large, complex environments. Partner with Product, Precision Defense, and Platform teams to define and deliver offensive capabilities that influence the roadmap and enhance customer outcomes. Drive execution from proof-of-concept through production to transform cutting-edge attack research into scalable, productized features. Stay hands-on to guide architectural decisions and evaluate exploit and automation approaches, mentoring technical leads in building resilient, modular systems. Build, mentor, and scale diverse teams of software engineers, exploit developers, and offensive researchers, fostering a culture of collaboration, creativity, and engineering excellence that bridges offensive and product software development. Collaborate across engineering, product, and GTM teams to align offensive innovation with business priorities and ensure delivery of impactful capabilities for customers. This role is central to the mission of delivering continuous, autonomous security testing at scale.
Software Engineer, Workload Enablement
Port and validate key inference and training workloads on new platforms/SKUs as they arrive, driving correctness, performance, and stability to an internal readiness bar. Build a suite of benchmarks and stress tests that capture real end-to-end behavior of workloads by exercising all aspects of a system, including CPU, GPU, memory subsystem, frontend, scale-up, and scale-out networking, storage, thermals, and other relevant parts. Conduct deep-dive performance analysis on distributed training and inference focusing on collective performance and tuning, overlap of compute/communication, kernel-level bottlenecks, memory bandwidth, and scheduling effects. Create repeatable test harnesses that run in continuous integration and lab environments producing actionable outputs such as pass/fail, performance scores, and regression detection. Partner with systems and fleet bring-up engineers to ensure the platform is stable, performant, operationally usable, and scalable through containerization, Kubernetes integration, telemetry hooks, and failure triage loops. Work cross-functionally with vendors and internal stakeholders by producing clear bug reports, minimal reproductions, and prioritized issue lists.
Software Engineer, Codex Core Agents
The role involves designing and building execution environments for AI agents, including sandboxing, isolation, and reproducibility. It includes developing systems for agent orchestration across multi-step, tool-using workflows and building infrastructure for running, testing, and debugging code generated by models. Responsibilities also include creating state and memory systems that allow agents to persist context across long-running tasks, optimizing tokens, latency, reliability, and cost across Codex’s production fleet, and supporting model rollouts, capacity planning, and managing tradeoffs between quality, speed, and economics to maintain a fleet of frontier agents at scale. Additionally, the job entails building shared platform capabilities that unblock product teams, partner teams, and open source Codex.
Software Engineer
The Software Engineer in the Defence team will build and extend critical components of client deliverables across diverse software domains, deliver robust technical artefacts in both compiled and non-compiled languages, implement defined engineering patterns and practices tailored for the Defence sector, collaborate closely with Machine Learning Engineering and Data Science teams to integrate and refine technical solutions, apply rigorous software engineering best practices to enhance scalability and quality of codebases, and execute CI/CD processes while managing application deployments on Kubernetes and bare metal environments.
Engineering Manager, Distillation & Dectection Platform
Lead a team of software engineers building detection and mitigation systems for frontier model misuse, focusing on model IP protection, distillation detection, and emerging risks from autonomous agents. Set the technical roadmap and execution strategy including prioritization, design, shipping, iteration, and impact measurement. Build production systems such as services, pipelines, tooling, instrumentation, and automation that can scale with frontier model usage. Partner with Research and Product teams to translate evolving model capabilities into scalable tests, signals, and mitigations. Drive strong engineering fundamentals including architecture, reliability, monitoring, performance, and operational excellence. Hire and grow a team across backend, data systems, and applied ML engineering domains. Anticipate and address scalability challenges as agentic workflows advance.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
