Member of Engineering (Reinforcement Learning Infrastructure)
Keep up with the latest research, and be familiar with the state of the art in LLMs, RL, and code generation. Develop methods for tuning training and inference end-to-end for high throughput. Design data control systems in an RL pipeline that govern what the model sees and when. Debug cases where infrastructure decisions are silently degrading learning dynamics. Build observability tooling that surfaces when a system-level issue is the root cause of a training regression. Help build robust, flexible and scalable RL pipelines. Optimize performance across the stack — networking, memory, compute scheduling, and I/O. Write high-quality, pragmatic code. Work in the team: plan future steps, discuss, and always stay in touch.
Staff Software Engineer, Core Infrastructure
As a Staff Software Engineer on the Core Infrastructure team at Harvey, your responsibilities include designing and building scalable, fault-tolerant infrastructure systems that power Harvey's AI platform across multiple cloud regions. You will own and evolve the multi-cloud infrastructure (Azure, GCP), including Kubernetes orchestration, networking, and container management. You will lead technical initiatives focused on observability, incident response, and operational excellence, building systems for rapid detection and resolution of issues. Architecting and optimizing distributed systems for reliability, including load balancing, quota management, and failover mechanisms, will be part of your role. You will partner with Product Engineering and Security teams to ensure infrastructure accelerates product development, drive infrastructure-as-code practices using tools like Terraform and Pulumi for reproducible deployments, and mentor engineers through code reviews, design reviews, and technical leadership. Representative projects include designing model proxy architecture for handling inference requests, building distributed rate limiting and quota management systems, architecting multi-region deployment strategies for data residency compliance, developing observability infrastructure with SLA monitoring and cost tracking, and leading CI/CD pipeline evolution to improve velocity and stability.
Tokens-as-a-Service (Taas) Software Engineer
Develop systems and tooling to measure, monitor, and improve token throughput across first-party and partner-owned compute environments. Support performance benchmarking, tokenomics analysis, and model porting across heterogeneous infrastructure environments. Build tooling to integrate external or partner infrastructure into OpenAI’s internal compute, observability, and workload management systems. Develop and monitor operational metrics including billing, usage, SLAs, utilization, reliability, and throughput. Identify bottlenecks across hardware, networking, software, and workload enablement that prevent capacity from becoming productive tokens. Partner with compute, infrastructure, networking, finance, and operations teams to translate raw capacity into usable workload-serving capacity. Build dashboards, automation, and reporting systems that provide clear visibility into TaaS capacity, performance, and business outcomes.
Software Engineer I , Coding Pod
As a Software Engineer on the Coding Pod, you will build the data infrastructure and pipelines that power frontier AI coding models. Responsibilities include designing and building scalable data pipelines for generating, transforming, and validating large-scale coding datasets; developing systems for task generation, dataset curation, and quality assurance, including automated and human-in-the-loop evaluation workflows; integrating with developer ecosystems such as GitHub and building tooling to support real-world coding environments; working with containerized environments like Docker to safely execute and evaluate code at scale; building backend systems and APIs that power dataset delivery and model evaluation pipelines; collaborating closely with ML researchers, product managers, and other engineers to define evaluation methodologies and improve dataset quality; implementing automated grading, benchmarking, and assessment systems for coding tasks; debugging and optimizing pipeline performance, reliability, and scalability across distributed systems; and contributing to architectural decisions around data infrastructure, evaluation systems, and pipeline orchestration.
Software Engineer, Compute Infrastructure
In this role, you will spin up and scale large Kubernetes clusters, including automating provisioning, bootstrapping, and cluster lifecycle management; build software abstractions that unify multiple clusters and provide a seamless interface to training workloads; own node bring-up from bare metal through firmware upgrades ensuring fast and repeatable deployment at massive scale; improve operational metrics such as reducing cluster restart times and accelerating firmware or OS upgrade cycles; integrate networking and hardware health systems to deliver end-to-end reliability across servers, switches, and data center infrastructure; develop monitoring and observability systems to detect issues early and maintain cluster stability under extreme load; solve real-time operational challenges, diagnose and fix issues quickly, and continuously improve automation, resilience, performance, and uptime across the systems powering frontier AI model training.
VP Engineering - London
The VP Engineering is responsible for defining and executing a scalable, defensible technology strategy; building a world-class engineering organization and platform; partnering with the CEO on product direction, investor communication, and long-term vision; and ensuring the successful bridging of frontier AI research with enterprise-grade deployment. Responsibilities include architecting and scaling H's AI platform, making build vs. buy decisions, ensuring performance, reliability, and cost efficiency, establishing technical moats, translating AI capabilities into enterprise-ready products, standardizing bespoke systems, balancing iteration speed with robustness, building and leading engineering teams, scaling organizational structure, implementing quality processes, acting as a key counterpart to the CEO in board and investor discussions, articulating technology and product roadmaps, providing technical due diligence, operating cross-functionally across Research, Product, and Go-to-Market, aligning engineering with customer and revenue goals, and helping define long-term company positioning.
VP Engineering - Paris
The VP Engineering is responsible for defining and executing a scalable, defensible technology strategy, including architecting and scaling the AI platform with a focus on agents, orchestration, model integration, and infrastructure. They make critical build versus buy decisions across the technology stack, ensure performance, reliability, and cost efficiency at scale, and establish durable technical moats in a rapidly evolving AI landscape. They translate cutting-edge AI capabilities into repeatable, enterprise-ready products, standardize systems that are currently bespoke or forward-deployed, and balance speed of iteration with platform robustness and maintainability. They build and lead a high-caliber engineering organization, scaling from a startup structure to multi-layered, high-output teams and implement processes to enable speed without sacrificing quality. The VP Engineering acts as a key counterpart to the CEO in board and investor discussions, clearly articulates the company's technology and product roadmap, and provides credibility and depth in technical due diligence and fundraising contexts. They operate at the intersection of Research, Product, and Go-to-Market, align engineering execution with customer outcomes and revenue growth, and help define the company’s long-term product and platform positioning.
Engineering Manager, Cooperative Systems
Lead and grow a small team building applied AI systems for internal operations. Design and build AI-powered automation systems in close proximity to customers. Stay hands-on in architecture and implementation across the full stack. Develop evolving systems spanning developer tools, automation platforms, knowledge graphs, and data systems. Deploy systems directly to internal users and close customers to iterate rapidly based on real-world feedback. Engage frequently with scaled workforces to understand needs and validate solutions. Create systems for visibility and learning in hybrid workforces. Partner with product, research, and ops teams daily.
Parcel Contract Intelligence Consultant
Ship critical infrastructure by managing real-world logistics and financial data for the largest enterprise in the world. Own the why by building deep context through customer calls and understanding Loop’s value to customers, pushing back on requirements if a better, faster solution exists. Work across system boundaries with full-stack proficiency, including frontend UX, LLM agents, database schema, and event infrastructures. Leverage AI tools to automate boilerplate work, focusing on quality, architecture, and product taste. Constantly optimize development loops, refactor legacy patterns, automate workflows, and fix broken processes to raise the velocity bar.
Staff Software Engineer, Security Controls Telemetry & Detection
The Staff Software Engineer is responsible for owning the end-to-end technical vision for the EDR telemetry and detection workstream, rallying the team from concept through shipping, iterating, and deprecating. This includes producing production code contributions in a modern backend language such as Go, Rust, or Python within a service-oriented environment and setting technical standards through design reviews, code quality, and operational discipline by example. The role involves mentoring engineers, building frameworks and architecture to enable high performance, partnering with the hiring team on recruiting and leveling engineers, and holding the team accountable for outcomes by managing risks and tradeoffs early and in writing. The engineer translates ambiguous product goals into concrete technical roadmaps, makes decisions regarding build versus buy or integration with business context, partners closely with product management in PRD reviews and sprint planning, and sequences MVP development effectively. Domain expertise is required in EDR platforms including telemetry, API level, detection logic, alert triage, and SOC team workflows. The engineer builds ground truth datasets, manages false positive and false negative tradeoffs and confidence scoring, and owns the detection and measurement methodology, including ground truth methodology, confidence scoring, calibration, and defining what constitutes correct tuning recommendations. The position requires collaboration, and contributing both to leadership and hands-on coding, and may include up to 10% travel.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Need help with something? Here are our most frequently asked questions.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
