Software Engineer, Observability (Full-Stack)
The Software Engineer on the Workspace & Observability Team at Anyscale is responsible for building user-facing application features for the Anyscale AI platform, focusing on the backend to implement the core business logic of these features. Responsibilities include interacting with users to understand their requirements, designing and implementing features, maintaining and improving these features over time, and working on observability tools that help users monitor and debug AI applications running on distributed clusters. Specific projects may involve developing the Ray Dashboard observability tool, library-specific observability tools like the Ray Train and Ray Serve dashboards, a unified log viewer for querying logs across a Ray cluster, and anomaly detection features to automatically identify and suggest fixes for performance bottlenecks or bugs. The role also involves collaborating with distributed systems and machine learning experts, communicating work through talks, tutorials, and blog posts, and contributing to building and shaping the company.
Staff Software Engineer, Bots
As a member of the Bots team, design, build, and scale systems that enhance user engagement with the AI-powered platform, including bot chat orchestration, AI image generation, AI video generation, and tooling for managing these features. Collaborate with cross-functional teams like product managers, designers, and data specialists to deliver high-quality, performant, and maintainable features. Experiment with and integrate new AI image, video, and voice generation technologies. Build tooling and infrastructure around various AI technologies. Gain exposure to the architecture and operations of a fast-growing social AI product. Contribute expertise to evolve team processes and technical infrastructure, ensuring scalability and reliability.
Span - Sr Product Engineer
Work on projects such as developing a product that root causes KTLO work and recommends solutions, building a software catalog that works for monoliths and is user-friendly, and helping protect engineering focus time by systemically solving sources of distraction or mental load with AI.
Software Engineer I , Coding Pod
As a Software Engineer on the Coding Pod, you will build the data infrastructure and pipelines that power frontier AI coding models. Responsibilities include designing and building scalable data pipelines for generating, transforming, and validating large-scale coding datasets; developing systems for task generation, dataset curation, and quality assurance, including automated and human-in-the-loop evaluation workflows; integrating with developer ecosystems such as GitHub and building tooling to support real-world coding environments; working with containerized environments like Docker to safely execute and evaluate code at scale; building backend systems and APIs that power dataset delivery and model evaluation pipelines; collaborating closely with ML researchers, product managers, and other engineers to define evaluation methodologies and improve dataset quality; implementing automated grading, benchmarking, and assessment systems for coding tasks; debugging and optimizing pipeline performance, reliability, and scalability across distributed systems; and contributing to architectural decisions around data infrastructure, evaluation systems, and pipeline orchestration.
Software Engineer, Model Serving Infrastructure
The role involves contributing to the development of next-generation, high-performance machine learning serving systems. Responsibilities include building infrastructure that powers AI applications, working on problems at the intersection of distributed systems, machine learning, and high-performance computing, and solving fundamental computer science problems impacting AI deployment. Specific projects include implementing asynchronous inference for non-blocking client requests, designing intelligent request routing systems to balance load across thousands of model replicas with strict latency SLAs, building traffic management systems for zero-downtime model updates handling terabytes of inference requests, improving state management for scale from thousands to tens of thousands of replicas, architecting frameworks for multi-model orchestration in complex ML pipelines ensuring end-to-end latency guarantees, and developing observability and debugging tools for distributed ML applications at scale. The work involves writing performance-critical code in Python (with Cython optimizations) and potentially C++, working with distributed systems at scale using Ray Core's actor system, gRPC, and custom networking protocols, extending cloud-native infrastructure such as Kubernetes and service meshes, gaining system-level knowledge of ML/AI frameworks like TensorFlow, PyTorch, JAX, and transformers, and ensuring production reliability with tools like OpenTelemetry, Prometheus, distributed tracing, and chaos engineering to maintain 99.99% uptime. The role also involves leveraging AI coding agents to enhance team productivity while maintaining high code quality standards.
Software Engineer, Early Career
As a Software Engineer at Mirage, you will work across product engineering, backend/platform engineering, and applied AI teams. Responsibilities include designing and building systems, APIs, and infrastructure that power products; solving challenges involving distributed systems, scaling, and performance; integrating and operating large AI models in production; building core platform components such as storage, billing, observability, and security; shipping end-to-end product experiences for creative workflows; building polished, performant user interfaces (web or native mobile); pushing the boundaries of video, graphics, and AI-powered creation tools; instrumenting, A/B testing, and iterating quickly with real user data; building and shipping AI-powered product experiences end-to-end; working with state-of-the-art models across video, audio, image, and text; designing systems for context, reasoning, and intelligent behavior; and building evals, datasets, and tooling for improving model quality.
Senior Software Engineer (Builders)
Design, build, and operate scalable back-end systems that power AI agent and workflow builders. Own mission-critical services and infrastructure, delivering impactful features from ideation through to production. Push the boundaries of applied AI by enabling new agent capabilities, workflow orchestration, and system behaviours. Shape how engineering is done by influencing standards, architecture, and processes as the company scales. Mentor and support engineers across the team to raise the technical quality and ownership. Set and uphold high standards for code quality, performance, reliability, and security. Collaborate closely with product, design, and leadership to align technical direction with business outcomes.
Senior Software Engineer (Chat)
Design, build, and operate scalable back-end systems that power real-time, AI-driven chat experiences. Own mission-critical services and infrastructure, delivering impactful features from ideation through to production. Push the boundaries of applied AI by enabling new agent capabilities, workflows, and system behaviours. Shape engineering standards, architecture, and processes as the company scales. Mentor and support engineers across the team, raising the bar for technical quality and ownership. Set and uphold high standards for code quality, performance, reliability, and security. Collaborate closely with product, design, and leadership to align technical direction with business outcomes.
Staff Software Engineer (Builders)
Design, build, and operate scalable back-end systems that power AI agent and workflow builders. Own mission-critical services and infrastructure, delivering impactful features from ideation through to production. Push the boundaries of applied AI by enabling new agent capabilities, workflow orchestration, and system behaviours. Shape how the engineering team builds by influencing engineering standards, architecture, and processes as the company scales. Mentor and support engineers across the team to raise the bar for technical quality and ownership. Set and uphold high standards for code quality, performance, reliability, and security. Collaborate closely with product, design, and leadership teams to align technical direction with business outcomes.
Software Engineer
Design and build the backend systems and services that power Sesame's product, including data models, APIs, and distributed systems. Write durable software focusing on scalability, reliability, and correctness rather than prototyping. Build and evolve frameworks and libraries for other engineers to use, emphasizing good software design. Own the full lifecycle of services, including schema design, implementation, deployment, performance tuning, and on-call responsibilities. Work with various data stores such as relational databases, NoSQL, queues, caches, and search indexes. Identify and resolve performance bottlenecks while considering cost, throughput, and latency. Architect systems where machine learning models are a key component but not the sole aspect, such as real-time audio pipelines, agentic orchestration, and stateful conversation systems. Identify opportunities to improve developer efficiency through prototyping tools or workflow improvements and collaborate with the infrastructure team to productionize them.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
