TLM, Embedded Experiences
Lead the technical direction, architecture, and execution of critical Cooperative Systems initiatives. Manage and mentor a team of engineers while maintaining meaningful hands-on technical involvement. Partner closely with stakeholders across Support, Operations, Finance, IT, Sales, Legal, and other functions to identify opportunities for AI-driven improvements. Design and build production systems that leverage large language models and other AI technologies. Drive engineering excellence through strong technical decision-making, code quality, operational rigor, and thoughtful system design. Balance rapid experimentation with long-term platform investments. Establish technical roadmaps and execution plans for projects spanning multiple teams. Coach engineers through technical challenges, career growth, and project execution. Help shape the culture, processes, and engineering practices of a growing organization.
VP of Engineering
Lead the design and evolution of the AI cloud platform including GPU orchestration, compute scheduling, networking, storage, and distributed systems. Make critical decisions regarding cloud infrastructure, bare-metal deployments, and platform scalability. Participate personally in architecture reviews and key technical initiatives. Build and scale large GPU clusters supporting customer workloads and design systems for GPU provisioning, scheduling, utilization optimization, and capacity management. Drive platform reliability and performance for AI training and inference workloads, partnering closely with engineering teams on infrastructure requirements for next-generation AI systems. Remain deeply involved in engineering decisions and technical direction, contribute directly to infrastructure design and implementation efforts, review architecture proposals, system designs, and major infrastructure changes, and act as the technical escalation point for complex infrastructure challenges. Establish best practices for Kubernetes, observability, CI/CD, security, and operational excellence. Build SRE and Platform Engineering functions from the ground up. Define reliability standards including SLOs, SLIs, incident response processes, and capacity planning. Drive automation across infrastructure operations. Recruit and develop Infrastructure, Platform, and SRE teams. Build a high-performance engineering culture focused on ownership and execution. Partner with executive leadership on company strategy and infrastructure investments. Manage infrastructure budgets, vendor relationships, and capacity planning.
Systems Research Engineer Intern - GPU Programming (Fall 2026)
Participate in on-call rotation (Pagerduty) to respond to production incidents. Build and run infrastructure with Ansible, Terraform, and Kubernetes to enable scaling to a large number of concurrent users. Build monitoring systems to ensure the highest quality service for customers. Design and implement operational processes such as deployments and upgrades. Debug production issues across all services and levels of the stack. Identify improvements for the product architecture from the perspectives of reliability, performance, and availability. Plan the growth of Together AI's infrastructure.
Research Intern, Inference (Fall 2026)
As an AI Infrastructure Engineer at Together, the responsibilities include participating in on-call rotation to respond to production incidents, building and running infrastructure using Ansible, Terraform, and Kubernetes to support scaling to a large number of concurrent users, building monitoring systems to ensure high-quality service, designing and implementing operational processes such as deployments and upgrades, debugging production issues across all services and stack levels, identifying improvements for product architecture in terms of reliability, performance, and availability, and planning the growth of Together AI's infrastructure.
Frontier Agents Intern (Fall 2026)
As an AI Infrastructure Engineer at Together AI, the responsibilities include participating in on-call rotation (Pagerduty) to respond to production incidents; building and running infrastructure with Ansible, Terraform, and Kubernetes to enable scaling for a massive number of concurrent users; building monitoring systems to ensure the highest quality service for customers; designing and implementing operational processes such as deployments and upgrades; debugging production issues across all services and levels of the stack; identifying improvements for the product architecture from reliability, performance, and availability perspectives; and planning the growth of Together AI's infrastructure.
Agentic Product Analyst
As an Agentic Product Analyst at Netomi, you will be responsible for designing, architecting, and deploying large-scale Agentic AI solutions for enterprise customers. This includes leading discovery sessions with customers to understand business processes, identifying automation opportunities, and designing agentic orchestration strategies using Netomi's AI platform. You will build detailed solution blueprints covering workflows, data exchanges, escalation logic, analytics, and agent lifecycle design. Defining end-to-end Agentic AI architectures and working with customer technical teams to map integration dependencies are also key tasks. You will own the creation of integration design documents, support Integration Engineers during implementation, and ensure agent workflows comply with enterprise standards. Collaboration with Product & Engineering to translate requirements into features, serving as product owner during deployment, validating solution behavior with QA, conducting user-experience reviews, training customer teams, and ensuring projects deliver on time with measurable impact also fall under your responsibilities. Additionally, you are expected to act as a trusted advisor to customer stakeholders, present architectural recommendations, drive continuous improvement, and maintain deep expertise in agentic AI, LLMs, workflow orchestration, and enterprise systems.
Senior Backend Engineer- AI Agents (Remote)
Design and build scalable backend systems powering AI Agents that operate in real-time enterprise environments. Develop agent orchestration frameworks involving multi-step reasoning, tool usage, and decisioning workflows. Build systems for agent memory, context management, and state persistence across interactions. Architect low-latency inference pipelines integrating Large Language Models, Small Language Models, and external tools/services. Implement evaluation frameworks to measure agent performance, accuracy, and reliability. Enable continuous improvement loops for AI agents in production including feedback, retraining, and deployment. Design and manage event-driven, asynchronous workflows for complex agent tasks. Optimize systems for high throughput, low latency, and cost-efficient inference at scale. Build and maintain robust APIs and service layers (REST/gRPC) for agent capabilities. Partner closely with Applied AI/ML teams to productionize models and agent behaviors. Collaborate with Product and Solutions teams to translate real customer workflows into agentic systems. Drive best practices in observability, monitoring, safety, and guardrails for AI systems. Contribute to architecture decisions for scaling multi-tenant, enterprise-grade AI platforms.
Full Stack Software Engineer, Codex
Build end-to-end product experiences that span frontend applications, backend services, agent workflows, cloud infrastructure, and developer tooling. Design AI-powered workflows that generalize across a wide variety of software engineering teams, languages, codebases, and development practices. Discover and implement novel ways to apply AI to eliminate friction throughout the software development lifecycle. Partner closely with product, design, and research to understand developer needs and rapidly translate insights into shipped product improvements. Work directly with users—including developers at OpenAI, open-source contributors, startups, and large enterprises—to understand pain points and validate solutions. Improve the reliability, observability, scalability, and performance of the systems and workflows you build.
Member of Technical Staff (Machine Learning Engineer)
Translate cutting-edge research into production-ready machine learning systems. Design, build, and deploy end-to-end ML models and pipelines. Develop and optimize models for image and video processing. Own the full ML lifecycle including experimentation, training/fine-tuning, evaluation, and deployment. Rapidly prototype using open-source models and adapt them for product needs. Conduct experiments, analyze results, and iterate to improve performance. Collaborate with researchers and cross-functional teams (product, engineering, design) to deliver ML solutions at scale. Participate with advancements in machine learning and apply them to continuously improve products.
AI Deployment Engineer
Serve as the primary technical subject matter expert post-sale for a portfolio of customers, embedding deeply with them to design and deploy GenAI solutions. Engage with senior business and technical stakeholders to identify, prioritize, and validate the highest-value GenAI applications in their roadmap. Accelerate customer time to value by providing architectural guidance, building hands-on prototypes, and advising on best practices for scaling solutions in production. Maintain strong relationships with leadership and technical teams to drive adoption, expansion, and successful outcomes. Contribute to open-source resources and enterprise-facing technical documentation to scale best practices across customers. Share learnings and collaborate with internal teams to inform product development and improve customer outcomes. Codify knowledge and operationalize technical success practices to help the Solutions Architecture team scale impact across industries and customer types.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Need help with something? Here are our most frequently asked questions.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
