Software Engineer, Backend
Design, build, and own backend systems end-to-end, including services, APIs, data pipelines, and infrastructure that power the products. Solve complex technical challenges across distributed systems, scaling, concurrency, and performance. Integrate and operate large generative AI models in production by deploying, serving, and scaling systems that combine internal research and external capabilities to unlock new product experiences. Instrument, experiment, and iterate in production to continuously improve system and product quality. Design and operate core platform infrastructure, including integrations with third-party providers, storage systems, security, and internal APIs.
Software Engineer, Agent
Design and deliver production-grade AI agents that are highly performant, reliable, and intuitive, central to driving revenue and used in production environments across various industries such as finance, healthcare, and commerce. Have complete ownership and autonomy over the Agent Development Life Cycle (ADLC) from initial pilot through deployment and continuous iteration, including building, tuning, and evolving AI agents while defining ADLC best practices. Partner with large enterprises and startups to understand business challenges and build AI agents that transform operations at scale. Build and evolve Sierra's core platform by surfacing unmet needs, prototyping new tools and features, and collaborating with research, product, and platform teams to shape the future of AI agent development and Sierra's products.
Staff Product Designer, Go Enterprise
Own the observability and lifecycle management of AI features across the organization. Build tools and infrastructure to enable teams to develop, monitor, and optimize LLM-powered features. Design and implement closed-loop evaluation pipelines that automatically validate prompt changes. Develop comprehensive metrics and dashboards to track LLM usage including cost per feature, token patterns, and latency. Create systems that tie user feedback to specific prompts and LLM calls. Establish best practices and processes for the full lifecycle of prompts including development, testing, deployment, and monitoring. Collaborate with engineering teams across the organization to ensure they have the tools and visibility needed to build high-quality AI features.
Software Engineer
As a Software Engineer at Magic, you will work on core systems or product surfaces that directly determine model capability and user experience. This role can involve working on Pre-training Data, RL Research & Environments, or Product, depending on your background and strengths. Responsibilities include end-to-end ownership such as defining problems, implementing solutions, shipping to production, and iterating based on outcomes. You will address challenges with internet-scale data acquisition, long-horizon post-training loops, and workflows to make complex model behavior understandable and controllable. Tasks may include building and scaling large distributed data pipelines for pre-training, designing filtering, mixture, and dataset versioning systems, developing post-training datasets, evaluation frameworks, and reward pipelines, running ablations to translate capability goals into measurable improvements, building end-to-end product surfaces that integrate deeply with the model, designing APIs, backend services, and frontend workflows for AI-first experiences, and improving reliability, observability, and performance of production systems.
Staff Software Engineer, Model LifeCycle
The Staff Software Engineer for the Model LifeCycle team is responsible for building a comprehensive managed platform for the application development lifecycle with a focus on Machine Learning models, including Large Language Models (LLMs). Responsibilities include contributing to fine-tuning systems for large foundation models, implementing and maintaining end-to-end training pipelines for Large Language Models, contributing to distillation and reinforcement learning pipelines, developing and maintaining agent execution infrastructure, and implementing features for dataset, model, and experiment management such as versioning, lineage, evaluation, and reproducible fine-tuning at scale. The role also involves working closely with Principal Engineers, product, business, and platform teams to implement core abstractions and APIs, contributing to architectural decisions around training runtimes, scheduling, storage, and model lifecycle management, and engaging with the open-source LLM ecosystem. This position offers significant scope for ownership and contribution to the design of core systems.
Senior Full Stack Engineer, Backend Engineering
Design and implement high-throughput, isolated processing clusters for Enterprise clients, ensuring strict tenant isolation and High Availability without noisy neighbor interference. Drive improvements across Temporal workflow clusters and production Kubernetes environments, applying scaling strategies to support both self-serve consumers and high-touch Enterprise contracts. Collaborate with the AI/ML team to transform experimental models into scalable, production-ready services by owning the infrastructure connecting model outputs to user-facing features with minimal latency. Build and operate high-dimensional vector database infrastructure to power "OpusSearch" for users to find exact moments across thousands of hours of video using natural language. Architect backend systems including bulk workflow orchestration, resource isolation, and multi-tenancy to enable large media houses to manage massive video archives.
Software Engineer, Voice Agents / AI - Deepgram for Restaurants
The responsibilities include designing, developing, and maintaining scalable, high-performance backend systems for an automated order-taking platform. The engineer will collaborate closely with the team to ensure seamless integration of backend systems with machine learning models and client devices. They will monitor and optimize backend system performance in production environments, build and maintain integrations with third-party restaurant software systems such as POS, loyalty, payment gateways, and customer data platforms. Responsibilities also include implementing best practices in system design, code quality, and testing to ensure a reliable, secure, and maintainable system; optimizing the AI pipeline to improve performance in challenging audio environments and handle ambiguous customer requests; pushing the boundaries of large language models (LLMs) and voice AI technology; and running experiments to validate the product impact of new functionality.
Principal Engineer, C++/Integration (R4539)
The Principal Engineer on the Special Projects team is responsible for creating reference implementations for potential future products or product components by integrating new hardware platforms, sensor suits, simulators, and concepts of operation with the Hivemind SDK (C++) for commercial applications, focusing on autonomy and simulation. They iterate rapidly with customer feedback by demonstrating developed architectures as solutions to customers and gathering feedback for iteration. They explore and evaluate future hardware and software technologies relevant to Shield AI’s product roadmap and beyond current Direct and IRAD projects. Additionally, they identify areas of technical debt across the stack and analyze and synthesize solutions and paths towards resolving them. They work closely with product teams and contribute directly to Hivemind software ecosystem products, supporting the development and deployment of resilient intelligent teaming for aircrafts.
Senior Backend Engineer, LangSmith Deployments
Design distributed queue and worker systems that handle concurrent agent execution, background tasks, and multi-agent coordination across horizontally scalable infrastructure. Own core data infrastructure including state persistence, atomic job claiming, connection management, and schema evolution. Collaborate on architectural decisions to ensure scalable and robust solutions. Ship resumable streaming infrastructure allowing clients to disconnect and reconnect mid-execution without losing state. Instrument and monitor production systems with tracing, metrics, and alerting to maintain platform health. Participate in on-call rotations and own incident response for the runtime. Create and maintain technical documentation including system design and operational runbooks. Contribute to and extend the open-source LangGraph.
Senior Software Engineer - New Products
Own and lead projects and product areas end-to-end, including architecture, implementation, rollout, and long-term operations. Design ergonomic, developer-friendly APIs and abstractions for infrastructure capabilities. Build and operate reliable backend services such as rate limiting, auth, quotas, metering, and migrations with clear SLOs. Drive performance and reliability improvements through profiling, tracing, load testing, and capacity planning. Mentor teammates through code reviews, design docs, and technical leadership.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
