Inference Technical Lead, On-Device Transformers
As a Technical Lead on the Future of Computing Research team, you will evaluate and select silicon platforms such as GPUs, NPUs, and specialized accelerators for on-device and edge deployment of OpenAI models. You will work closely with research teams to co-design model architectures that meet real-world deployment constraints including latency, memory, power, and bandwidth. You will analyze and model system performance, identifying tradeoffs between model design, memory hierarchy, compute throughput, and hardware capabilities. You will partner with hardware vendors and internal infrastructure teams to bring up new accelerators and ensure efficient execution of transformer workloads. Additionally, you will build and lead a team of engineers responsible for implementing the low-level inference stack, including kernel development and runtime systems. You will also take nascent research capabilities and develop them into usable capabilities.
AI & IT Systems Engineer
As Jasper undergoes an agentic AI shift, the AI & IT Systems Engineer role involves ensuring the IT infrastructure is robust, secure, and fine-tuned for advanced AI workflows, spending 70-80% of time on AI enablement deployments. Responsibilities include modernizing and improving IT systems to support autonomous AI workflows, building scalable automation infrastructure to enhance efficiency and reduce manual tasks, and operationalizing AI initiatives using tools like Claude, ChatGPT, and Zapier to create intelligent, cross-platform workflows involving platforms like Google Workspace and Slack. The role also requires managing core IT systems such as Identity Providers and Mobile Device Management, streamlining identity and access operations using features like Okta Workflows, and providing cross-functional technical support across departments to implement AI enablement projects. Additionally, the engineer manages a broad SaaS ecosystem, including Google Workspace and Linear, and assists in developing training resources and playbooks to facilitate team adoption of new AI tools.
Customer Support Engineer (Inference), India
Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference. Implement and maintain changes in high-performance inference engines, including kernel backends, speculative decoding, and quantization. Profile and optimize performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Design and operate RL and post-training pipelines, jointly optimizing algorithms and systems to make inference and post-training workloads more efficient. Train, evaluate, and iterate on frontier models using these pipelines. Co-design algorithms and infrastructure for tightly coupled objectives, rollout collection, and evaluation to efficient inference. Identify bottlenecks across training engine, inference engine, data pipeline, and user-facing layers. Run ablations and scale-up experiments to understand trade-offs between model quality, latency, throughput, and cost, feeding insights back into model, RL, and system design. Profile, debug, and optimize inference and post-training services under real production workloads. Drive roadmap items requiring engine modification such as changing kernels, memory layouts, scheduling logic, and APIs. Establish metrics, benchmarks, and experimentation frameworks to rigorously validate improvements. Provide technical leadership by setting technical direction for cross-team efforts at the intersection of inference, RL, and post-training, and mentoring other engineers and researchers on full-stack ML systems work and performance engineering.
Helix AI Engineer, Agentic Systems
Design, deploy, and maintain Figure's training clusters. Architect and maintain scalable deep learning frameworks for training on massive robot datasets. Work together with AI researchers to implement training of new model architectures at a large scale. Implement distributed training and parallelization strategies to reduce model development cycles. Implement tooling for data processing, model experimentation, and continuous integration.
Manual Quality Assurance Engineer, Web Core Product
Work alongside machine learning researchers, engineers, and product managers to bring AI Voices to customers for diverse use cases. Deploy and operate the core ML inference workloads for the AI Voices serving pipeline. Introduce new techniques, tools, and architecture that improve performance, latency, throughput, and efficiency of deployed models. Build tools to identify bottlenecks and sources of instability and design and implement solutions to address the highest priority issues.
AI Infrastructure Engineer
Operate and maintain a large-scale GPU cluster consisting of thousands of GPUs across multiple data centers using Kubernetes and Slurm. Monitor and diagnose failures across the GPU hardware and software stacks to ensure high availability and rapid recovery. Develop automation tools and scripts using Python or Shell to streamline repetitive infrastructure management tasks and improve operational efficiency. Manage GPU resource quotas and provide technical support to ML researchers to ensure optimal utilization of computing resources. Participate in the architectural design and performance tuning of distributed training environments for large-scale autonomous driving models.
Director of Governance & Risk Compliance
The role involves designing and developing the production lifecycle of full-stack AI applications and supporting end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and resilient cloud infrastructure for international government partners. Responsibilities include taking full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies, overseeing the end-to-end health of the platform ensuring seamless integration between AI core and full-stack components, building automated systems to monitor model performance and data drift across dispersed environments, managing the technical lifecycle within diverse regulatory frameworks, leading incident response for production issues in mission-critical environments, translating technical performance metrics into clear insights for senior government officials, and partnering with Engineering and ML teams to drive product evolution based on field lessons.
Staff Software Engineer, GPU Infrastructure (HPC)
As a Staff Software Engineer, you will build and scale ML-optimized HPC infrastructure by deploying and managing Kubernetes-based GPU/TPU superclusters across multiple clouds ensuring high throughput and low-latency performance for AI workloads. You will optimize for AI/ML training by collaborating with cloud providers to fine-tune infrastructure for cost efficiency, reliability, and performance, using technologies like RDMA, NCCL, and high-speed interconnects. You will troubleshoot and resolve complex issues by identifying and resolving infrastructure bottlenecks, performance degradation, and system failures to minimize disruption to AI/ML workflows. You will enable researchers with self-service tools by designing intuitive interfaces and workflows that allow researchers to monitor, debug, and optimize their training jobs independently. You will drive innovation in ML infrastructure by working closely with AI researchers to understand emerging needs such as JAX, PyTorch, and distributed training and translating them into robust, scalable infrastructure solutions. You will champion best practices by advocating for observability, automation, and infrastructure-as-code (IaC) across the organization to ensure systems are maintainable and resilient. Additionally, you will provide mentorship and collaborate through code reviews, documentation, and cross-team efforts to foster a culture of knowledge transfer and engineering excellence.
Helix Data Creator
Design, deploy, and maintain Figure's training clusters. Architect and maintain scalable deep learning frameworks for training on massive robot datasets. Work together with AI researchers to implement training of new model architectures at a large scale. Implement distributed training and parallelization strategies to reduce model development cycles. Implement tooling for data processing, model experimentation, and continuous integration.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
