Staff Software Engineer, Core Infrastructure
As a Staff Software Engineer on the Core Infrastructure team at Harvey, your responsibilities include designing and building scalable, fault-tolerant infrastructure systems that power Harvey's AI platform across multiple cloud regions. You will own and evolve the multi-cloud infrastructure (Azure, GCP), including Kubernetes orchestration, networking, and container management. You will lead technical initiatives focused on observability, incident response, and operational excellence, building systems for rapid detection and resolution of issues. Architecting and optimizing distributed systems for reliability, including load balancing, quota management, and failover mechanisms, will be part of your role. You will partner with Product Engineering and Security teams to ensure infrastructure accelerates product development, drive infrastructure-as-code practices using tools like Terraform and Pulumi for reproducible deployments, and mentor engineers through code reviews, design reviews, and technical leadership. Representative projects include designing model proxy architecture for handling inference requests, building distributed rate limiting and quota management systems, architecting multi-region deployment strategies for data residency compliance, developing observability infrastructure with SLA monitoring and cost tracking, and leading CI/CD pipeline evolution to improve velocity and stability.
Software Engineer, Model Serving Infrastructure
The role involves contributing to the development of next-generation, high-performance machine learning serving systems. Responsibilities include building infrastructure that powers AI applications, working on problems at the intersection of distributed systems, machine learning, and high-performance computing, and solving fundamental computer science problems impacting AI deployment. Specific projects include implementing asynchronous inference for non-blocking client requests, designing intelligent request routing systems to balance load across thousands of model replicas with strict latency SLAs, building traffic management systems for zero-downtime model updates handling terabytes of inference requests, improving state management for scale from thousands to tens of thousands of replicas, architecting frameworks for multi-model orchestration in complex ML pipelines ensuring end-to-end latency guarantees, and developing observability and debugging tools for distributed ML applications at scale. The work involves writing performance-critical code in Python (with Cython optimizations) and potentially C++, working with distributed systems at scale using Ray Core's actor system, gRPC, and custom networking protocols, extending cloud-native infrastructure such as Kubernetes and service meshes, gaining system-level knowledge of ML/AI frameworks like TensorFlow, PyTorch, JAX, and transformers, and ensuring production reliability with tools like OpenTelemetry, Prometheus, distributed tracing, and chaos engineering to maintain 99.99% uptime. The role also involves leveraging AI coding agents to enhance team productivity while maintaining high code quality standards.
Software Engineer, Inference - Performance Optimization
Build and refine performance models that translate microbenchmark results into cost-to-serve estimates. Analyze inference workloads end to end across applications, models, and fleet infrastructure. Enhance tooling to identify bottlenecks across layers for latency and throughput. Partner with other teams to turn performance insights into concrete improvements and project how future changes affect inference.
Staff Software Engineer (Builders)
Design, build, and operate scalable back-end systems that power AI agent and workflow builders. Own mission-critical services and infrastructure, delivering impactful features from ideation through to production. Push the boundaries of applied AI by enabling new agent capabilities, workflow orchestration, and system behaviours. Shape how the engineering team builds by influencing engineering standards, architecture, and processes as the company scales. Mentor and support engineers across the team to raise the bar for technical quality and ownership. Set and uphold high standards for code quality, performance, reliability, and security. Collaborate closely with product, design, and leadership teams to align technical direction with business outcomes.
Software Engineer
Design and build the backend systems and services that power Sesame's product, including data models, APIs, and distributed systems. Write durable software focusing on scalability, reliability, and correctness rather than prototyping. Build and evolve frameworks and libraries for other engineers to use, emphasizing good software design. Own the full lifecycle of services, including schema design, implementation, deployment, performance tuning, and on-call responsibilities. Work with various data stores such as relational databases, NoSQL, queues, caches, and search indexes. Identify and resolve performance bottlenecks while considering cost, throughput, and latency. Architect systems where machine learning models are a key component but not the sole aspect, such as real-time audio pipelines, agentic orchestration, and stateful conversation systems. Identify opportunities to improve developer efficiency through prototyping tools or workflow improvements and collaborate with the infrastructure team to productionize them.
Systems Engineer - Physical Products
You will be responsible for defining operational domains and evaluating the reliability of the AI capabilities developed in-house. You will develop and extend the state-of-the-art in uncertainty quantification and uncertainty calibration. This will involve understanding the AI systems built, interfacing with them, and evaluating their robustness in real-world and adversarial scenarios. You will contribute to impactful projects and collaborate with people across several teams and backgrounds.
C++ Systems Engineer
Design, build, and optimize the core native runtime that powers LM Studio and the C++ libraries powering the app and APIs. Work across the runtime, LLM engines, llama.cpp/MLX integrations, build infrastructure, and on-device AI software. Focus on system and library integration by wiring the C++ runtime to GPU backends, vendor SDKs, and operating-system services to support user-facing applications. Implement and harden system-level code including threading, memory, files, IPC, and scheduling. Integrate platform acceleration paths such as Metal, CUDA, and Vulkan across macOS, Windows, and Linux. Profile, debug, and tune execution paths for local AI to be fast and dependable, and maintain well-architected software. Contribute to the C++ runtime powering LM Studio, extend LLM engine integrations, and build platform-aware performance features for desktop operating systems. Implement resilient IPC, resource management, and scheduling logic to support concurrent model execution. Improve build, packaging, and release infrastructure for native components. Collaborate with the team to deliver cohesive and recognizable user experiences.
Performance Modeling Engineer ~2
Support the development and maintenance of performance modeling tools and frameworks; assist in building models to evaluate system behavior across compute, memory, networking, and interconnect subsystems; help analyze distributed system scaling behavior and identify performance bottlenecks; run simulations and analytical models to support architecture and infrastructure decisions; partner with senior engineers to evaluate design tradeoffs across hardware and system components; interpret modeling outputs and help translate findings into clear recommendations; validate models using benchmarking data and real system performance measurements; improve modeling workflows, documentation, and usability for broader team adoption; collaborate cross-functionally with hardware, infrastructure, and architecture teams; and continuously build technical depth across AI infrastructure, system architecture, and performance analysis.
Performance Modeling Engineer
Develop and maintain performance modeling tools and frameworks. Build models to evaluate system behavior across compute, memory, and interconnect subsystems as well as distributed system scaling and bottlenecks. Run simulations and analytical models to support architectural tradeoff analysis. Collaborate with performance modeling lead and system architects to answer forward-looking design questions. Analyze and interpret modeling outputs, translating results into actionable insights. Validate models against real system measurements and workload behavior. Contribute to improving modeling fidelity, usability, and scalability.
Software Engineer, Kernel Performance & AI Tooling
The role involves building developer tooling and workflows to accelerate kernel development and performance optimization, developing observability, diagnostics, and validation infrastructure for AI-assisted optimization systems, optimizing production kernels through various techniques including search loops and bottleneck analysis, designing abstractions, interfaces, and automation systems for kernel optimization and hardware-software co-design, improving AI-assisted optimization systems with better datasets and benchmarking, and partnering across research and engineering teams to translate ideas into practical systems that span production and long-term infrastructure strategy.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
