Researcher, Alignment Science
As a Research Engineer / Research Scientist on the Alignment team, you will design and implement alignment experiments focused on intent following, honesty, calibration, and robustness. You will train and evaluate models using reinforcement learning and other empirical machine learning methods. Your role includes developing evaluations for failure modes such as hallucination, instruction-following failures, reward hacking, covert actions, and scheming. You will study methods that encourage models to verify their behavior and report shortcomings honestly, including confession-style training objectives. You will build monitoring and inference-time interventions that ensure compliant behavior or surface model issues to users or downstream systems. Additionally, you will investigate how alignment methods scale with model capability, compute, data, context length, action length, and adversarial pressure. You will integrate successful techniques into model training and deployment workflows, produce externally publishable research when results advance the broader science of alignment, and collaborate with researchers and engineers across post-training, reinforcement learning, evaluations, safety, and product-facing teams.
Machine Learning Enginer, Core Evaluations
The responsibilities include designing model evaluation pipelines for models in both development and production environments, designing user studies for subjective model evaluations, converting requirements into measurable metrics, and designing and developing automated evaluation dashboards to monitor and compare model performance. It also involves training new models to capture various evaluation metrics, communicating with the model team to help design improved models based on evaluation results, coordinating with the data team to determine necessary data for enhancing model performance, collaborating with the product manager to ensure product requirements are accurately measured, helping to grow the evaluation team as the founding member, and leading the evaluation team in the future.
Lead Member of Technical Staff, Inference Infrastructure
The Lead Member of Technical Staff, Inference Infrastructure, is responsible for providing technical leadership across multiple teams, driving the architecture and strategy for deploying optimized NLP models to production in low latency, high throughput, and high availability environments. They lead the design of customized deployments to meet specific customer needs and mentor engineers to raise the technical standards across the team. The role involves contributing to the development, deployment, and operation of the AI platform delivering large language models through easy-to-use API endpoints, and serving as a key point of contact for customers.
AI Agent Engineer – Marketing
The AI Agent Engineer is responsible for embedding within the marketing team to understand business goals, processes, and bottlenecks, and building systems that enhance efficiency and accuracy. They identify high-leverage workflows in content production, ICP testing, campaign setup, reporting, creative review, and partner coordination, prioritizing by business impact. They build AI-powered agents, automations, and tools that permanently transform workflows into actual working systems used daily by the marketing team. They optimize the marketing engine infrastructure to automate generating targeted ads, landing pages, UGC briefs, and creative content ready for human review and publishing. They drive adoption of new important systems by ensuring the entire team knows and can use them. Additionally, they build evaluation frameworks to measure time savings, output quality, and throughput, helping to continuously improve marketing operations.
Staff Software Engineer, Core Infrastructure
As a Staff Software Engineer on the Core Infrastructure team at Harvey, your responsibilities include designing and building scalable, fault-tolerant infrastructure systems that power Harvey's AI platform across multiple cloud regions. You will own and evolve the multi-cloud infrastructure (Azure, GCP), including Kubernetes orchestration, networking, and container management. You will lead technical initiatives focused on observability, incident response, and operational excellence, building systems for rapid detection and resolution of issues. Architecting and optimizing distributed systems for reliability, including load balancing, quota management, and failover mechanisms, will be part of your role. You will partner with Product Engineering and Security teams to ensure infrastructure accelerates product development, drive infrastructure-as-code practices using tools like Terraform and Pulumi for reproducible deployments, and mentor engineers through code reviews, design reviews, and technical leadership. Representative projects include designing model proxy architecture for handling inference requests, building distributed rate limiting and quota management systems, architecting multi-region deployment strategies for data residency compliance, developing observability infrastructure with SLA monitoring and cost tracking, and leading CI/CD pipeline evolution to improve velocity and stability.
Tokens-as-a-Service (Taas) Software Engineer
Develop systems and tooling to measure, monitor, and improve token throughput across first-party and partner-owned compute environments. Support performance benchmarking, tokenomics analysis, and model porting across heterogeneous infrastructure environments. Build tooling to integrate external or partner infrastructure into OpenAI’s internal compute, observability, and workload management systems. Develop and monitor operational metrics including billing, usage, SLAs, utilization, reliability, and throughput. Identify bottlenecks across hardware, networking, software, and workload enablement that prevent capacity from becoming productive tokens. Partner with compute, infrastructure, networking, finance, and operations teams to translate raw capacity into usable workload-serving capacity. Build dashboards, automation, and reporting systems that provide clear visibility into TaaS capacity, performance, and business outcomes.
Software Engineer I , Coding Pod
As a Software Engineer on the Coding Pod, you will build the data infrastructure and pipelines that power frontier AI coding models. Responsibilities include designing and building scalable data pipelines for generating, transforming, and validating large-scale coding datasets; developing systems for task generation, dataset curation, and quality assurance, including automated and human-in-the-loop evaluation workflows; integrating with developer ecosystems such as GitHub and building tooling to support real-world coding environments; working with containerized environments like Docker to safely execute and evaluate code at scale; building backend systems and APIs that power dataset delivery and model evaluation pipelines; collaborating closely with ML researchers, product managers, and other engineers to define evaluation methodologies and improve dataset quality; implementing automated grading, benchmarking, and assessment systems for coding tasks; debugging and optimizing pipeline performance, reliability, and scalability across distributed systems; and contributing to architectural decisions around data infrastructure, evaluation systems, and pipeline orchestration.
Software Engineer, Compute Infrastructure
In this role, you will spin up and scale large Kubernetes clusters, including automating provisioning, bootstrapping, and cluster lifecycle management; build software abstractions that unify multiple clusters and provide a seamless interface to training workloads; own node bring-up from bare metal through firmware upgrades ensuring fast and repeatable deployment at massive scale; improve operational metrics such as reducing cluster restart times and accelerating firmware or OS upgrade cycles; integrate networking and hardware health systems to deliver end-to-end reliability across servers, switches, and data center infrastructure; develop monitoring and observability systems to detect issues early and maintain cluster stability under extreme load; solve real-time operational challenges, diagnose and fix issues quickly, and continuously improve automation, resilience, performance, and uptime across the systems powering frontier AI model training.
Research Infrastructure Engineer, Training Systems
Build and maintain infrastructure for large-scale model training and experimentation. Design APIs and interfaces to simplify complex training workflows and prevent misuse. Improve reliability, debuggability, and performance of training and data pipelines. Debug issues across technologies including Python, PyTorch, distributed systems, GPUs, networking, and storage. Write tests, benchmarks, and diagnostics to detect significant regressions.
Parcel Contract Intelligence Consultant
Ship critical infrastructure by managing real-world logistics and financial data for the largest enterprise in the world. Own the why by building deep context through customer calls and understanding Loop’s value to customers, pushing back on requirements if a better, faster solution exists. Work across system boundaries with full-stack proficiency, including frontend UX, LLM agents, database schema, and event infrastructures. Leverage AI tools to automate boilerplate work, focusing on quality, architecture, and product taste. Constantly optimize development loops, refactor legacy patterns, automate workflows, and fix broken processes to raise the velocity bar.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Need help with something? Here are our most frequently asked questions.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
