Performance & Systems Engineer, Codex
As a Performance & Systems Engineer on the Codex team, you will be responsible for whole-system optimization across a complex, evolving stack including LLM inference, cloud orchestration, agentic work management, and multiple product surfaces. Your job will be to identify and implement high-leverage changes across infrastructure, modeling, and product layers that make Codex agents significantly faster and cheaper to serve. Responsibilities include hunting down and addressing inefficiencies throughout the Codex system stack, from agent behavior to LLM inference to container orchestration and beyond. You are expected to build tooling to measure, profile, and optimize system performance at scale, and collaborate with researchers and engineers to implement high-ROI changes that improve latency and cost. This role involves direct impact on the experience of millions of users and requires a high level of ownership.
Staff Software Engineer, Core Infrastructure
As a Staff Software Engineer on the Core Infrastructure team at Harvey, your responsibilities include designing and building scalable, fault-tolerant infrastructure systems that power Harvey's AI platform across multiple cloud regions. You will own and evolve the multi-cloud infrastructure (Azure, GCP), including Kubernetes orchestration, networking, and container management. You will lead technical initiatives focused on observability, incident response, and operational excellence, building systems for rapid detection and resolution of issues. Architecting and optimizing distributed systems for reliability, including load balancing, quota management, and failover mechanisms, will be part of your role. You will partner with Product Engineering and Security teams to ensure infrastructure accelerates product development, drive infrastructure-as-code practices using tools like Terraform and Pulumi for reproducible deployments, and mentor engineers through code reviews, design reviews, and technical leadership. Representative projects include designing model proxy architecture for handling inference requests, building distributed rate limiting and quota management systems, architecting multi-region deployment strategies for data residency compliance, developing observability infrastructure with SLA monitoring and cost tracking, and leading CI/CD pipeline evolution to improve velocity and stability.
Software Engineer, Model Serving Infrastructure
The role involves contributing to the development of next-generation, high-performance machine learning serving systems. Responsibilities include building infrastructure that powers AI applications, working on problems at the intersection of distributed systems, machine learning, and high-performance computing, and solving fundamental computer science problems impacting AI deployment. Specific projects include implementing asynchronous inference for non-blocking client requests, designing intelligent request routing systems to balance load across thousands of model replicas with strict latency SLAs, building traffic management systems for zero-downtime model updates handling terabytes of inference requests, improving state management for scale from thousands to tens of thousands of replicas, architecting frameworks for multi-model orchestration in complex ML pipelines ensuring end-to-end latency guarantees, and developing observability and debugging tools for distributed ML applications at scale. The work involves writing performance-critical code in Python (with Cython optimizations) and potentially C++, working with distributed systems at scale using Ray Core's actor system, gRPC, and custom networking protocols, extending cloud-native infrastructure such as Kubernetes and service meshes, gaining system-level knowledge of ML/AI frameworks like TensorFlow, PyTorch, JAX, and transformers, and ensuring production reliability with tools like OpenTelemetry, Prometheus, distributed tracing, and chaos engineering to maintain 99.99% uptime. The role also involves leveraging AI coding agents to enhance team productivity while maintaining high code quality standards.
Software Engineer, Inference - Performance Optimization
Build and refine performance models that translate microbenchmark results into cost-to-serve estimates. Analyze inference workloads end to end across applications, models, and fleet infrastructure. Enhance tooling to identify bottlenecks across layers for latency and throughput. Partner with other teams to turn performance insights into concrete improvements and project how future changes affect inference.
Staff Software Engineer (Builders)
Design, build, and operate scalable back-end systems that power AI agent and workflow builders. Own mission-critical services and infrastructure, delivering impactful features from ideation through to production. Push the boundaries of applied AI by enabling new agent capabilities, workflow orchestration, and system behaviours. Shape how the engineering team builds by influencing engineering standards, architecture, and processes as the company scales. Mentor and support engineers across the team to raise the bar for technical quality and ownership. Set and uphold high standards for code quality, performance, reliability, and security. Collaborate closely with product, design, and leadership teams to align technical direction with business outcomes.
Software Engineer
Design and build the backend systems and services that power Sesame's product, including data models, APIs, and distributed systems. Write durable software focusing on scalability, reliability, and correctness rather than prototyping. Build and evolve frameworks and libraries for other engineers to use, emphasizing good software design. Own the full lifecycle of services, including schema design, implementation, deployment, performance tuning, and on-call responsibilities. Work with various data stores such as relational databases, NoSQL, queues, caches, and search indexes. Identify and resolve performance bottlenecks while considering cost, throughput, and latency. Architect systems where machine learning models are a key component but not the sole aspect, such as real-time audio pipelines, agentic orchestration, and stateful conversation systems. Identify opportunities to improve developer efficiency through prototyping tools or workflow improvements and collaborate with the infrastructure team to productionize them.
Performance Modeling Engineer ~2
Support the development and maintenance of performance modeling tools and frameworks; assist in building models to evaluate system behavior across compute, memory, networking, and interconnect subsystems; help analyze distributed system scaling behavior and identify performance bottlenecks; run simulations and analytical models to support architecture and infrastructure decisions; partner with senior engineers to evaluate design tradeoffs across hardware and system components; interpret modeling outputs and help translate findings into clear recommendations; validate models using benchmarking data and real system performance measurements; improve modeling workflows, documentation, and usability for broader team adoption; collaborate cross-functionally with hardware, infrastructure, and architecture teams; and continuously build technical depth across AI infrastructure, system architecture, and performance analysis.
Performance Modeling Engineer
Develop and maintain performance modeling tools and frameworks. Build models to evaluate system behavior across compute, memory, and interconnect subsystems as well as distributed system scaling and bottlenecks. Run simulations and analytical models to support architectural tradeoff analysis. Collaborate with performance modeling lead and system architects to answer forward-looking design questions. Analyze and interpret modeling outputs, translating results into actionable insights. Validate models against real system measurements and workload behavior. Contribute to improving modeling fidelity, usability, and scalability.
Software Engineer, Kernel Performance & AI Tooling
The role involves building developer tooling and workflows to accelerate kernel development and performance optimization, developing observability, diagnostics, and validation infrastructure for AI-assisted optimization systems, optimizing production kernels through various techniques including search loops and bottleneck analysis, designing abstractions, interfaces, and automation systems for kernel optimization and hardware-software co-design, improving AI-assisted optimization systems with better datasets and benchmarking, and partnering across research and engineering teams to translate ideas into practical systems that span production and long-term infrastructure strategy.
Software Engineer - C++ GPU Performance
As a GPU performance software engineer within the Software Performance team, you will instrument, monitor, analyze and optimize GPU-based algorithms that are performance-critical for the Zoox self-driving solution. You will build real-time instrumentation for performance monitoring of CPU, GPU, latency, and memory, and develop offline benchmarking frameworks, tools, and scripts to evaluate and analyze performance at scale in continuous integration and vehicle environments. You will establish performance budgets for next-generation architectures, analyze performance metrics to identify GPU hotspots and root causes, and propose and co-implement actionable solutions with component teams. You will support teams in bringing serial algorithms to the GPU to maximize compute utilization and improve overall latency. Additionally, you will work as part of the Core team to design a middleware framework that promotes efficient and performant code development by maximizing CPU and GPU usage by default.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
