Backend Engineer- Inference Services
The Backend Engineer is responsible for leading the design and implementation of Deepgram's products, specifically developing secure, robust, and scalable services for speech processing, distributed compute orchestration, and optimized scheduling. Responsibilities include improving Deepgram's core inference services in networking, speech processing, audio transcoding, and latency and memory optimization, developing processes for measuring, building, and optimizing services to maximize system performance, debugging complex system issues involving networking, scheduling, and high performance computing, rapidly customizing backend services to support customer needs, and partnering with Product to design and implement new services, features, and products end to end.
Application Engineer
Run projects together with customers' engineering teams, analyze and process engineering data using their platform and python libraries, develop tailored solutions and workflows, apply machine- and deep-learning workflows to different engineering/physics problems, deliver proofs-of-concept demonstrating the value of their technology in CAD, CAE, and manufacturing, train customer teams to effectively use their methods and platform ensuring smooth AI adoption into their workflows, and work closely with developers to translate customer needs and feedback into product improvements.
Senior Analytics Engineer
Design and develop AI applications primarily in Python. Run evaluations to validate models and package solutions for Kubernetes, AWS, or adapt them to customer on-premises clusters. Lead discovery sessions, guide pilot projects, and ensure successful deployments. Collaborate mostly remotely with occasional on-site workshops. Monitor system performance and reliability. Add to the logging, billing and auth services. Build internal tooling to automate repetitive tasks. Provide feedback on patterns, pain points, and reusable modules to the core product team to influence the future direction of the AI platform.
Senior Forward Deployed Engineer
Lead complex AI-driven deployments in production, owning technical delivery across multiple deployments from scoping high-impact Agentic AI use cases to stable production. Apply technical expertise and problem-solving skills to design solution architectures, develop decision logic, deploy production-grade Generative AI agents, and align with key customer stakeholders, ensuring an outstanding experience and rapid time to value. Scope work effectively, sequence delivery, proactively remove blockers, and make trade-offs between scope, speed, and quality for successful and timely project delivery. Partner with product management to convert customer needs into actionable insights that influence the product roadmap. Develop reusable resources, best practices, and tools to scale the forward deployed engineering function across the organization.
Backend Engineer
Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference. Implement and maintain changes in high-performance inference engines including kernel backends, speculative decoding, and quantization. Profile and optimize performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Design and operate RL and post-training pipelines to jointly optimize algorithms and systems where most cost is inference. Make RL and post-training workloads more efficient with inference-aware training loops such as async RL rollouts and speculative decoding. Use these pipelines to train, evaluate, and iterate on frontier models on top of the inference stack. Co-design algorithms and infrastructure so objectives, rollout collection, and evaluation are tightly coupled to efficient inference and identify bottlenecks across training engine, inference engine, data pipeline, and user-facing layers. Run ablations and scale-up experiments to understand trade-offs between model quality, latency, throughput, and cost, and feed insights back into model, RL, and system design. Profile, debug, and optimize inference and post-training services under production workloads. Drive roadmap items requiring engine modifications including kernels, memory layouts, scheduling logic, and APIs. Establish metrics, benchmarks, and experimentation frameworks to validate improvements rigorously. Provide technical leadership by setting technical direction for cross-team efforts, mentoring engineers and researchers on full-stack ML systems work and performance engineering.
Lead Field Marketing & Events Manager
Partner with frontier AI research labs to design datasets and environments that improve model performance. Lead technical conversations with customer researchers to understand model capabilities, failure modes, data requirements, and success criteria. Probe model behavior through systematic evaluation to uncover weaknesses and identify high-impact data interventions. Design evaluation frameworks, calibration processes, and quality rubrics that establish measurable project success metrics. Develop technical specifications for data projects that balance research rigor with operational feasibility. Serve as thought partner to customer research teams throughout the sales cycle, building trust and credibility. Stay current on frontier AI research, RL environment design, post-training techniques, and evaluation methodologies.
Computer Vision Engineer
Conduct research on state-of-the-art Computer Vision methodologies and participate in creating and curating training and validation datasets. Perform statistical analyses and develop visualization tools to ensure data quality. Build and refine training pipelines and metrics to enhance model performance. Develop and optimize Computer Vision algorithms for multiple robotics/aerospace projects. Implement ML/CV models into production-ready environments, ensuring seamless integration with Harmattan AI’s systems and conducting rigorous code reviews. Test algorithms in real-world environments and develop monitoring tools to track model performance and continuously improve deployed solutions. Work closely with software and simulation teams to align development with system requirements and communicate findings effectively to stakeholders.
Member of Technical Staff: Agent DX Research
The role involves collaborating with Modal’s SDK team and other product engineers to build a framework and process for evaluating agent productivity. Responsibilities include defining quantitative objectives, designing systems to measure performance, translating results into product improvements, staying current with new developments in tools and workflows, and working with customers to understand their use of coding agents with Modal and identify areas for providing more value.
ML Research Scientist (Health & Sensing)
The ML Research Scientist will use AI and Machine Learning to transform sensor data into personalized intelligent health and fitness experiences by working closely with a cross-functional R&D and production team to prototype and ship solutions. Projects include advancing the Pod’s adaptive thermoregulation system using reinforcement learning and closed-loop control, developing multimodal health foundation models integrating physiology and environmental context from Pod signals, wearable sensors, and contextual data, and building high-fidelity physiological simulators to model how daily behaviors affect sleep and readiness. The scientist will tackle problems with a systems approach and make data-driven decisions to deliver the best products to users.
Research Engineer
The Research Engineer will help design, evaluate, and productionize next generation AI inference systems by working at the intersection of applied research and real-world deployment to develop techniques that improve performance, efficiency, reliability, and cost of large-scale inference workloads. They will collaborate with systems engineers, ML engineers, and infrastructure teams to translate research ideas into practical implementations, writing and optimizing performance-critical kernels and improving low-level execution efficiency on modern accelerators. Responsibilities include researching, implementing, and evaluating state-of-the-art techniques for AI inference such as speculative decoding, prefill–decode disaggregation, quantization, and kernel-level optimizations focused on real-world customer use cases; designing and running experiments to understand trade-offs across latency, throughput, cost, and quality; analyzing real-world inference workloads to identify opportunities for improvements; staying current with advances in ML systems and inference research; sharing findings through internal reports and external contributions; and defining and contributing to the company roadmap to impact products and customers.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Need help with something? Here are our most frequently asked questions.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
