AI Agent Engineer, Client Facing
The AI Agent Engineer is responsible for building and deploying AI agents including prompt design, workflow configuration, integrations, telephony setup, and evaluation frameworks. The role involves acting as the primary technical partner for customers by leading regular demos, communicating progress, gathering feedback, and guiding solutions from concept to production. It includes configuring and connecting systems using APIs, handling authentication, data mapping, error handling, and integrating with CRMs, knowledge bases, and other enterprise tools. The engineer sets up telephony systems including SIP/CCaaS/PSTN routing, passes metadata, configures fallbacks, and troubleshoots call quality. Responsibilities also include writing and refining prompts for LLM-driven agents, monitoring their performance, conducting iterative testing, and ensuring agents meet automation and containment targets. The role requires translating customer requirements into actionable solutions and working consultatively to resolve challenges related to security, connectivity, or knowledge ingestion. Additionally, the engineer collaborates with product and engineering teams to escalate platform gaps, resolve technical issues, and independently drive leading client implementations.
GTM Engineer
Design and deploy production-grade agents using LangGraph and LangSmith that handle technical support queries, troubleshoot integrations, and guide users through complex onboarding flows. Analyze customer friction points to build self-service AI systems that reduce support volume and improve customer experience. Act as the product owner and technical lead to proactively identify opportunities for improvement, propose architectures, and own the full lifecycle of the systems built. Participate in the feedback loop for the product team by identifying gaps in frameworks and contributing to the LangChain and LangGraph open-source ecosystem. Develop AI-native onboarding workflows that automate documentation retrieval and code generation to help enterprise customers move from prototypes to production faster.
AI Deployment Engineer- Codex
Serve as the primary technical subject matter expert on OpenAI Codex for a portfolio of customers, embedding deeply with them to enable their engineering teams and build coding workflows. Partner directly with customers to design and implement AI-enhanced development workflows, from rapid prototyping through scalable production rollout. Build high-quality demos, reference implementations, and workflow automations, using Codex itself as part of the development process. Lead large-format workshops, technical deep dives, and hands-on enablement sessions that help engineering organizations adopt AI coding tools effectively and safely. Contribute technical content including examples, guides, patterns, and best practices to the OpenAI Cookbook to help the broader developer community accelerate their work with Codex. Gather high-fidelity product insights from real customer deployments and translate them into clear product proposals and model feedback for internal teams. Influence customer strategy and decision-making by framing how AI coding tools fit into their software development lifecycle, technical roadmap, and organizational workflows. Serve as a trusted advisor on solution architecture, operational readiness, model configuration, security considerations, and best-practice adoption.
Software Engineer (SF)
Work on a small, high-caliber team building AI products for clients, from requirements gathering and prototyping through system design, development, testing, and deployment. Own features end-to-end and develop domain expertise across a range of AI use cases. Spend most of the time coding and frequently interact with clients to ensure the solutions meet their needs.
Senior / Staff Software Engineer (SF/NY)
You will work on a small, high-caliber team building AI products for clients, setting technical direction, writing code, and serving as the go-to person when challenges arise. Spend approximately 75% of your time coding and 25% interacting with clients, including CTOs, to understand problems, evaluate tradeoffs, and ensure solutions meet their needs.
Staff Software Engineer, Bots
As a member of the Bots team, design, build, and scale systems that enhance user engagement with the AI-powered platform, including bot chat orchestration, AI image generation, AI video generation, and tooling for managing these features. Collaborate with cross-functional teams like product managers, designers, and data specialists to deliver high-quality, performant, and maintainable features. Experiment with and integrate new AI image, video, and voice generation technologies. Build tooling and infrastructure around various AI technologies. Gain exposure to the architecture and operations of a fast-growing social AI product. Contribute expertise to evolve team processes and technical infrastructure, ensuring scalability and reliability.
Researcher, Misalignment Research
Design and implement worst-case demonstrations that concretely reveal AGI alignment risks for stakeholders, focusing on high-stakes use cases; develop adversarial and system-level evaluations based on these demonstrations and promote their adoption across OpenAI; create automated tools and infrastructure to scale automated red-teaming and stress testing; conduct research on failure modes of alignment techniques and propose improvements; publish influential internal or external papers that impact safety strategy or industry practice; collaborate with engineering, research, policy, and legal teams to integrate findings into product safeguards and governance; and mentor engineers and researchers to foster a culture of rigorous, impact-oriented safety work.
Data Science - AI
The AI Evaluation Engineer will analyze training and evaluation datasets to identify distributional gaps, labeling inconsistencies, and long-tail opportunities; design and execute labeling campaigns including the development of golden datasets and annotation guidelines; build and maintain dashboards to track model accuracy, regression trends, and product-specific KPIs; investigate failure modes through prompt clustering, error taxonomy development, and user intent classification; operationalize feedback loops by mining product telemetry and human-in-the-loop reviews for signal and translate these into data-driven model improvement strategies; partner with engineers and product managers to run structured A/B tests and human evaluations for new models or features; support the development of scalable data and evaluation infrastructure for LLMs and agents; and work with product, engineering, and legal teams to create clear and transparent processes for handling customer data in AI training, fine-tuning, and evaluation.
Researcher, Alignment Science
As a Research Engineer / Research Scientist on the Alignment team, you will design and implement alignment experiments focused on intent following, honesty, calibration, and robustness. You will train and evaluate models using reinforcement learning and other empirical machine learning methods. Your role includes developing evaluations for failure modes such as hallucination, instruction-following failures, reward hacking, covert actions, and scheming. You will study methods that encourage models to verify their behavior and report shortcomings honestly, including confession-style training objectives. You will build monitoring and inference-time interventions that ensure compliant behavior or surface model issues to users or downstream systems. Additionally, you will investigate how alignment methods scale with model capability, compute, data, context length, action length, and adversarial pressure. You will integrate successful techniques into model training and deployment workflows, produce externally publishable research when results advance the broader science of alignment, and collaborate with researchers and engineers across post-training, reinforcement learning, evaluations, safety, and product-facing teams.
Machine Learning Enginer, Core Evaluations
The responsibilities include designing model evaluation pipelines for models in both development and production environments, designing user studies for subjective model evaluations, converting requirements into measurable metrics, and designing and developing automated evaluation dashboards to monitor and compare model performance. It also involves training new models to capture various evaluation metrics, communicating with the model team to help design improved models based on evaluation results, coordinating with the data team to determine necessary data for enhancing model performance, collaborating with the product manager to ensure product requirements are accurately measured, helping to grow the evaluation team as the founding member, and leading the evaluation team in the future.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
