Member of Technical Staff, Machine Learning
As a Member of Technical Staff, Machine Learning, the responsibilities include building and improving ML components across data, training, evaluation, and inference; fine-tuning and adapting models as part of larger production systems; implementing evaluation and testing to understand model behavior; helping build and maintain data pipelines for real-world and synthetic data; debugging model issues, performance problems, and production incidents; shipping improvements iteratively and learning from real user feedback; working closely with senior ML engineers and product teams; and working under real production constraints such as latency, cost, reliability, and safety.
Staff ML Systems Engineer, Distributed Systems
Design and build scalable distributed machine learning pipelines across data processing, model training, evaluation, and post-processing workflows. Architect distributed execution systems, including parallelization strategies, workload scheduling, resource allocation, and fault tolerance mechanisms. Develop reusable abstractions, frameworks, and libraries that simplify distributed pipeline development. Optimize performance across distributed CPU and GPU environments, improving throughput, utilization, and reliability. Design systems that effectively manage data partitioning, memory utilization, serialization overhead, and compute efficiency. Partner closely with ML engineers, data engineers, and infrastructure teams to productionize research workflows and enable large-scale model development. Establish best practices and engineering standards for distributed machine learning infrastructure. Evaluate and guide decisions around distributed computing frameworks, infrastructure technologies, and system design trade-offs. Improve observability, debugging, monitoring, and operational tooling for distributed systems at scale.
Field Engineering Intern - Summer 2026
The Field Engineering Intern will learn directly from ML engineers transitioning to customer-facing field engineering, gaining firsthand exposure to how deep ML expertise translates into real-world customer impact. They will work on real customer workloads running on advanced GPU infrastructure, supporting customer onboarding, optimization engagements, and production deployments across demanding ML use cases. They will review prior optimization work, evaluate strategies against current best practices, and recommend improvements. The intern will develop a structured optimization playbook and case studies capturing the team's methodology and quantifying the value of field engineering work in a repeatable, scalable format. Finally, they will present their work to company leadership at the close of the engagement.
Member of Engineering (Pre-training / Data Research)
Follow the latest research related to Large Language Models (LLMs) and data quality, being familiar with relevant open-source datasets and models. Design and implement complex pipelines to generate large amounts of diverse data while optimizing available resources. Collaborate closely with teams such as Pretraining, Posttraining, Evals, and Product to ensure short feedback loops on the quality of models delivered. Suggest, conduct, and analyze data ablations or training experiments to improve the quality of generated datasets using quantitative insights.
Director of Technology & Systems
As a Production AI Ops Lead, you will design and develop the production lifecycle of full-stack AI applications, while supporting end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and the resilient cloud infrastructure required for international government partners. You will own the production outcome by taking full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies. You will ensure full-stack integrity by overseeing the end-to-end health of the platform, ensuring seamless integration between the AI core and all full-stack components, from APIs to UI, to maintain a responsive and production-ready environment. You will build automated systems to monitor model performance and data drift across geographically dispersed environments, ensuring the right levels of reliability. You will manage the technical lifecycle within diverse regulatory frameworks, lead the response for production issues in mission-critical environments ensuring rapid resolution and building guardrails to prevent recurrence. You will translate deep technical performance metrics into clear insights for senior international government officials and partner with Engineering and ML teams to ensure lessons learned in the field directly influence the technical architecture and decisions of future use cases.
Research Engineer – Evals
Build the evaluation systems from scratch that measure whether Firecrawl's outputs are effective across scraping, crawling, extracting, and mapping. This includes designing metrics, building pipelines, curating datasets, and integrating evaluations into continuous integration and deployment to catch regressions before release. Design benchmarks that represent real customer data distribution including edge cases, and create the collection and labeling systems. Own LLM-as-judge pipelines by designing and validating automated judges for scoring extraction quality, understanding LLM evaluation failure modes, and building human review tooling. Collaborate with research engineers working on models and reinforcement learning to use evaluation metrics as training signals and feedback loops to improve models. Design, run, and communicate fast experiments that test meaningful hypotheses and enable clear decision-making across the team.
Machine Learning Engineer (Singapore)
Build and scale systems for ingesting, processing, and delivering large-scale video and multimodal data for model training. Own the full pipeline from raw content to curated, filtered, and training-ready datasets focusing on speed, reliability, reproducibility, and cost-efficiency. Design and scale distributed data pipelines for preprocessing, dataset generation, and repeated dataset refreshes. Own workflow orchestration, job scheduling, monitoring, and failure recovery for large-scale data processing jobs. Implement and maintain containerized pipeline infrastructure using Kubernetes or equivalent orchestration systems. Optimize cloud-based data storage and movement across providers (AWS, GCS, or Azure) for cost, throughput, and operational efficiency. Define and implement best practices for dataset storage layout, versioning, caching, retention, and access patterns. Design and implement curation pipelines for selection, filtering, and retention of video and image content for model training including image-text pair datasets. Build and improve VLM-based captioning and metadata generation workflows at scale across video and image data. Develop and apply quality and aesthetic scoring models, CLIP-based semantic filtering, and other signal-extraction approaches for data selection. Build tooling to support deduplication workflows at scale, including near-dedup and exact deduplication pipelines over large video corpora. Analyze dataset composition, identify quality issues, iterate on curation logic to improve training outcomes. Define and evolve standards for high-quality, training-ready video data across different training regimes.
Research Engineer, Training & Inference
Maintain and optimize the proprietary reinforcement learning (RL) training and serving infrastructure with total stack ownership, including the Python API to CUDA kernels, to achieve peak performance for foundation model workloads. Maximize throughput of the RL system from data generation to model training utilizing sharded multi-node training and inference algorithms. Optimize the inference stack for high-throughput RL and low-latency large language model (LLM) production traffic by tuning the inference engine, router, scheduler, and custom kernels if necessary. Identify and resolve performance bottlenecks in distributed clusters to ensure optimal throughput and memory efficiency for multi-billion parameter models, balancing memory constraints with compute-heavy training cycles.
Director, Forward Deployed Engineering
As a Production AI Ops Lead, you will design and develop the production lifecycle of full-stack AI applications, while supporting end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and the resilient cloud infrastructure required for international government partners. You will take full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies. You will oversee the end-to-end health of the platform, ensuring seamless integration between the AI core and all full-stack components from APIs to UI to maintain a responsive and production-ready environment. You will build automated systems to monitor model performance and data drift across geographically dispersed environments, ensuring the right levels of reliability. You will manage the technical lifecycle within diverse regulatory frameworks. You will lead the response for production issues in mission-critical environments, ensuring rapid resolution and building guardrails to prevent recurrence. You will translate deep technical performance metrics into clear insights for senior international government officials. You will partner with Engineering and ML teams to ensure lessons learned in the field directly influence the technical architecture and decisions of future use cases.
Applied ML Researcher (Force Fields and Simulation)
In this role, you will train, fine-tune, and distill machine learning force fields and research and develop novel ML force field architectures suited to production simulation workloads. You will integrate these models into public and in-house high-performance simulators and develop training and inference architectures for large-scale training, data generation, and simulation. You will distribute these workloads via Ray to scale across compute infrastructure and build modular systems so components can be reused across many kinds of chemistry. Additionally, you will build an active learning system that closes the loop between simulation, data generation, and training, develop interfaces that make the system easy for domain scientists to use and extend, and collaborate closely with computational chemists on density functional theory (DFT) data generation and validation.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
