TLM, Embedded Experiences
Lead the technical direction, architecture, and execution of critical Cooperative Systems initiatives. Manage and mentor a team of engineers while maintaining meaningful hands-on technical involvement. Partner closely with stakeholders across Support, Operations, Finance, IT, Sales, Legal, and other functions to identify opportunities for AI-driven improvements. Design and build production systems that leverage large language models and other AI technologies. Drive engineering excellence through strong technical decision-making, code quality, operational rigor, and thoughtful system design. Balance rapid experimentation with long-term platform investments. Establish technical roadmaps and execution plans for projects spanning multiple teams. Coach engineers through technical challenges, career growth, and project execution. Help shape the culture, processes, and engineering practices of a growing organization.
Software Engineer, Knowledge Systems
As a Software Engineer on Knowledge Systems, you will help build systems that understand what is true about the world by extracting, connecting, retrieving, and reasoning over knowledge from the web and beyond to enable AI agents to answer questions with unprecedented precision and completeness.
Senior Product Operations Manager, Evaluation
Build and scale the systems that power model and product evaluations across Harvey; run intake, triage, and prioritization for the evaluation request queue, routing capacity to the highest-value coverage gaps; embed evaluation workflows and readiness checkpoints into the product development lifecycle; create the single source of truth for evaluation status, results, history, and launch readiness; turn Expert-designed evaluation methodologies into scalable, repeatable operational processes; manage human data providers and stand up the internal contract-attorney pipeline, ensuring evaluation quality meets legal standards; work with Engineering and Research to improve evaluation tooling, automation, and dashboards; drive evaluation readiness for major product and model launches across geographies and jurisdictions; document and operationalize evaluation governance as complexity increases; help define how Harvey ensures model accuracy, reliability, and trust at global scale.
VP of Engineering
Lead the design and evolution of the AI cloud platform including GPU orchestration, compute scheduling, networking, storage, and distributed systems. Make critical decisions regarding cloud infrastructure, bare-metal deployments, and platform scalability. Participate personally in architecture reviews and key technical initiatives. Build and scale large GPU clusters supporting customer workloads and design systems for GPU provisioning, scheduling, utilization optimization, and capacity management. Drive platform reliability and performance for AI training and inference workloads, partnering closely with engineering teams on infrastructure requirements for next-generation AI systems. Remain deeply involved in engineering decisions and technical direction, contribute directly to infrastructure design and implementation efforts, review architecture proposals, system designs, and major infrastructure changes, and act as the technical escalation point for complex infrastructure challenges. Establish best practices for Kubernetes, observability, CI/CD, security, and operational excellence. Build SRE and Platform Engineering functions from the ground up. Define reliability standards including SLOs, SLIs, incident response processes, and capacity planning. Drive automation across infrastructure operations. Recruit and develop Infrastructure, Platform, and SRE teams. Build a high-performance engineering culture focused on ownership and execution. Partner with executive leadership on company strategy and infrastructure investments. Manage infrastructure budgets, vendor relationships, and capacity planning.
Forward Deployed Engineer I/II
Assist customer engagements from start to end by running discovery calls and demos, building and maintaining world class agents, participating in customer calls, and serving as the primary point of contact in a fast-paced environment. Own the full agent development life cycle including building and prototyping quickly, setting up CI/CD, monitoring live usage, iterating to targets, debugging live issues, communicating with customers, and documenting best practices to accelerate future projects. Close the feedback loop with product and platform teams by capturing unmet needs, prototyping new features, contributing directly to the codebase, and collaborating with core teams to strengthen the platform for all customers.
Systems Research Engineer Intern - GPU Programming (Fall 2026)
Participate in on-call rotation (Pagerduty) to respond to production incidents. Build and run infrastructure with Ansible, Terraform, and Kubernetes to enable scaling to a large number of concurrent users. Build monitoring systems to ensure the highest quality service for customers. Design and implement operational processes such as deployments and upgrades. Debug production issues across all services and levels of the stack. Identify improvements for the product architecture from the perspectives of reliability, performance, and availability. Plan the growth of Together AI's infrastructure.
Research Intern, Inference (Fall 2026)
As an AI Infrastructure Engineer at Together, the responsibilities include participating in on-call rotation to respond to production incidents, building and running infrastructure using Ansible, Terraform, and Kubernetes to support scaling to a large number of concurrent users, building monitoring systems to ensure high-quality service, designing and implementing operational processes such as deployments and upgrades, debugging production issues across all services and stack levels, identifying improvements for product architecture in terms of reliability, performance, and availability, and planning the growth of Together AI's infrastructure.
Frontier Agents Intern (Fall 2026)
As an AI Infrastructure Engineer at Together AI, the responsibilities include participating in on-call rotation (Pagerduty) to respond to production incidents; building and running infrastructure with Ansible, Terraform, and Kubernetes to enable scaling for a massive number of concurrent users; building monitoring systems to ensure the highest quality service for customers; designing and implementing operational processes such as deployments and upgrades; debugging production issues across all services and levels of the stack; identifying improvements for the product architecture from reliability, performance, and availability perspectives; and planning the growth of Together AI's infrastructure.
Sr. Manager, Integrated Campaigns and ABX
Build and deploy AI Agents including prompt design, workflow configuration, integrations, telephony setup, and evaluation frameworks. Act as the primary technical partner for customers by leading demos, communicating progress, gathering feedback, and guiding solutions from concept to production. Configure and connect systems using APIs, handling authentication, data mapping, error handling, and integrations with CRMs, knowledge bases, and other enterprise tools. Set up telephony systems including SIP/CCaaS/PSTN routing, pass metadata, configure fallbacks, and troubleshoot call quality. Write and refine prompts for LLM-driven agents, monitor performance, and ensure agents meet automation and containment targets. Translate customer requirements into actionable solutions and work consultatively to unblock challenges in security, connectivity, or knowledge ingestion. Collaborate with product and engineering teams to address platform gaps and resolve technical issues, independently driving leading client implementations.
Senior Backend Engineer- AI Agents (Remote)
Design and build scalable backend systems powering AI Agents that operate in real-time enterprise environments. Develop agent orchestration frameworks involving multi-step reasoning, tool usage, and decisioning workflows. Build systems for agent memory, context management, and state persistence across interactions. Architect low-latency inference pipelines integrating Large Language Models, Small Language Models, and external tools/services. Implement evaluation frameworks to measure agent performance, accuracy, and reliability. Enable continuous improvement loops for AI agents in production including feedback, retraining, and deployment. Design and manage event-driven, asynchronous workflows for complex agent tasks. Optimize systems for high throughput, low latency, and cost-efficient inference at scale. Build and maintain robust APIs and service layers (REST/gRPC) for agent capabilities. Partner closely with Applied AI/ML teams to productionize models and agent behaviors. Collaborate with Product and Solutions teams to translate real customer workflows into agentic systems. Drive best practices in observability, monitoring, safety, and guardrails for AI systems. Contribute to architecture decisions for scaling multi-tenant, enterprise-grade AI platforms.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Need help with something? Here are our most frequently asked questions.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
