AI Data Engineer Jobs

Discover the latest remote and onsite AI Data Engineer roles across top active AI companies. Updated hourly.

Join our AI community Interested in Hiring?

Hiring by

Check out 208 new AI Data Engineer opportunities posted on AI Chopping Block

View detail

Data Engineer

New

Top rated

Mach9

–

Full-time

–

Posted

Jun 10, 2026 7:16

Develop and maintain scalable, reproducible workflows for ingesting and processing large volumes of point cloud, imagery, and geospatial data. Convert datasets from various sensor providers into Mach9's standardized internal formats. Build CI/CD pipelines and automated checks that guarantee the correctness and consistency of data pipelines, including regression detection on dataset processing. Optimize processing performance, query speed, and storage efficiency across large geospatial datasets. Work closely with the customer success team to efficiently resolve issues and unblock customer projects. Build and maintain agentic harness for automated dataset triage and code patching, automatically proposing or applying fixes and escalating when human judgment is needed. Collaborate with ML and product teams to make data readily usable for training, inference, and visualization. Work closely with customers and data-provider partners to facilitate data integration, including occasional travel. Engage with data formats that have sparse or missing documentation to solve puzzles related to those data formats.

Undisclosed

()

San Francisco, United States

Maybe global

Onsite

View detail

Analytics Engineer

New

Top rated

Loop

–

Full-time

–

Posted

Jun 9, 2026 1:17

Ship critical infrastructure managing real-world logistics and financial data for the largest enterprise in the world. Own the why by building deep context through customer calls and understanding the company's value to customers, pushing back on requirements if a better, faster solution is seen. Demonstrate full-stack proficiency by working across system boundaries, including frontend UX, LLM agents, database schema, and event infrastructures. Leverage AI tools to automate boilerplate so focus can be on quality, architecture, and product taste. Constantly raise the velocity bar by optimizing development loops, refactoring legacy patterns, automating workflows, and fixing broken processes.

$125,000 – $125,000

Undisclosed

YEAR

(USD)

Chicago or SF

Maybe global

Hybrid

View detail

Junior Software Engineer

New

Top rated

Stability AI

–

Full-time

–

Posted

May 23, 2026 7:52

Oversee the end-to-end lifecycle of data acquisition and management for foundation models across 3D, video, image, and audio. Identify and acquire diverse datasets from public and commercial partners while managing complex technical and legal requirements. Collaborate with research teams to ensure data sources align with specific model training and fine-tuning needs. Manage technical lifecycle of large-scale data, including ingestion, curation, and AWS S3 storage optimization, ensuring system reliability and code quality. Develop internal tools and standards to make datasets searchable, accessible, and efficiently indexed for research. Partner with legal team to mitigate risks, ensure global regulatory compliance, and manage sensitive data protection rules. Represent the company in legal matters including providing testimony regarding data usage and licensing. Lead data vendor management by negotiating Master Services Agreements and Statements of Work, oversee partnerships for data annotation, evaluation, and collection projects. Drive cross-functional alignment between technical leads and researchers to ensure data strategy supports the company product roadmap.

$238,000 – $260,000

Undisclosed

YEAR

(USD)

United States or Canada

Maybe global

Onsite

View detail

Data Platform Engineer

New

Top rated

webAI

–

Full-time

–

Posted

May 8, 2026 3:13

Design and build robust connectors across SQL/NoSQL databases, APIs (REST/GraphQL), and SaaS platforms such as CRM and storage systems. Interpret and model heterogeneous source schemas. Transform raw source data into formats optimized for AI inference. Collaborate closely with ML, applied AI, and forward deployed teams to define feature expectations. Work with infrastructure teams to design and ship hosted data pipelines. Optimize for latency, consistency, and edge constraints. Design resilient ingestion patterns for unreliable or rate-limited systems. Build logging, monitoring, and debuggability into all integrations.

Undisclosed

()

Austin, United States

Maybe global

Remote

View detail

Signal Engineer

New

Top rated

Maincode

–

Full-time

–

Posted

Apr 23, 2026 17:01

The Signal Engineer will build pipelines that ingest, clean, deduplicate, filter, and score training data at terabyte to petabyte scale. They will develop quality classifiers and heuristics to separate useful data from the rest, design dataset mixtures, and conduct experiments to determine what improves the model. The engineer will also create tools to explore, sample, and audit the corpus, and work closely with researchers and training engineers to ensure data choices connect to model behavior.

Undisclosed

()

Melbourne, Australia

Maybe global

Onsite

View detail

Research Engineer, Data Infrastructure

New

Top rated

Mistral AI

–

Full-time

–

Posted

Apr 22, 2026 12:50

The role involves building and operating the next generation of data infrastructure at Mistral AI, being a core contributor to the design and scaling of massive compute fleets and storage systems for high performance and scalability. Responsibilities include architecting and maintaining multi-cluster orchestration layers for optimizing workload placement across diverse hardware and regions, designing future-proof storage systems anticipating exabyte scale growth, contributing to the internal training platform development to support model training and fine-tuning across Kubernetes and SLURM environments, implementing and managing metadata and lineage systems to provide visibility and traceability of data and model pipelines, and managing cloud-native deployments using modern workflows to ensure scalability and operational excellence. The role also includes full lifecycle ownership, from migrating away from legacy orchestrators to implementing production-grade pipelines and participating in on-call rotations for critical training jobs.

Undisclosed

()

Palo Alto, United States

Maybe global

Onsite

View detail

Agentic Finance Engineer

New

Top rated

Mercor

–

Full-time

–

Posted

Apr 19, 2026 3:39

The Agentic Finance Engineer is responsible for designing, building, and maintaining a reliable financial data foundation using modern tools, covering revenue, AP/AR, procurement, close, strategic finance, and FP&A. They will partner closely with the data infrastructure team to build the financial data model, define canonical datasets, dimensional schemas, and transformation logic for Finance stakeholders. This role includes partnering with Finance leads to translate business requirements into technical architecture, building and maintaining dashboards and self-serve reporting tools to provide real-time visibility into key metrics. The engineer will own the Agentic Finance roadmap, prioritize use cases, and drive features from ideation to deployment, identifying high-value automation opportunities across Finance and corporate operations, and shipping solutions to eliminate manual work. They will build intelligent, reliable automation using agents, AI-powered tools, multi-step ETL jobs, and internal tooling that Finance teams use, such as lightweight apps, workflow automations, and AI-assisted processes. The engineer must enforce data integrity standards and testing practices to ensure auditability and reliability, ensure AI-assisted processes meet governance and controls standards with clear auditability, and champion a culture of data quality and documentation so that Finance teams trust and rely on the systems built.

$175,000 – $250,000

Undisclosed

YEAR

(USD)

San Francisco, United States

Maybe global

Onsite

View detail

Senior Data Intelligence Engineer

New

Top rated

Deepgram

–

Full-time

–

Posted

Apr 16, 2026 0:15

The Senior Data Intelligence Engineer is responsible for building and maintaining high-fidelity dbt and SQL models that serve as the foundational data for complex, usage-based revenue models. They develop tools and permissions frameworks enabling 'Analyst Agents' to query data sources such as Athena, correlate Salesforce churn signals, and identify API latency issues. The engineer acts as the technical liaison with the Engineering/Infrastructure team to ensure data contracts are reliable and ready for autonomous agents. They partner with the Head of Data to ingest and transform thousands of hours of unstructured internal call audio into queryable insights for go-to-market teams using Deepgram’s own models. The role includes maintaining a culture focused on automating manual and repetitive SQL tasks through code and agent systems rather than legacy dashboards.

$165,000 – $230,000

Undisclosed

YEAR

(USD)

United States

Maybe global

Remote

View detail

Tech Lead Manager, Data Infrastructure

New

Top rated

Cartesia

–

Full-time

–

Posted

Apr 9, 2026 5:59

The Tech Lead Manager, Data Infrastructure at Cartesia is responsible for defining the overall multi-modal data strategy across pre-training and post-training, including human, synthetic, and web-scale data sources. They lead, manage, and mentor a team of data engineers and specialists. They design and oversee the construction of robust, scalable data pipelines for text, audio, and video, establish and enforce rigorous standards for data quality across the organization, deeply understand how data affects model capability and proactively identify and source novel datasets, and manage relationships and budgets with external data vendors and partners.

$250,000 – $375,000

Undisclosed

YEAR

(USD)

San Francisco, United States

Maybe global

Onsite

View detail

Engineer, Supercomputing & Distributed Systems

New

Top rated

Krea

–

Full-time

–

Posted

Apr 3, 2026 7:43

Build and operate infrastructure for research and inference including distributed training, 1000+ Kubernetes GPU clusters, and petabyte-scale data pipelines. Design multi-stage pipelines converting petabytes of raw data into clean, annotated datasets. Run classification models on billions of images. Deploy and combine large language models to caption massive multimedia data. Manage distributed training and inference on GPU Kubernetes clusters. Solve orchestration and scaling challenges for large-scale GPU job processing. Scale workloads and research between clusters in multiple datacenters. Profile and optimize dataloaders streaming thousands of images per second. Profile and debug InfiniBand networking on huge training runs. Build fault tolerance systems for large-scale pretraining. Collaborate with researchers on evolving reinforcement learning infrastructure. Find clean scenes in millions of videos using distributed shot-boundary detection. Customize and train models to filter billions of images. Build systems bridging raw cluster capacity and research output.

Undisclosed

()

San Francisco, United States

Maybe global

Onsite

Want to see more AI Data Engineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.

Join our community

(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Have questions about roles, locations, or requirements for AI Data Engineer jobs?

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What does an AI Data Engineer do?","answer":"AI Data Engineers build and manage data pipelines specifically for AI and machine learning models. They design architectures that process diverse data types such as text, images, and videos for model consumption. Their daily work includes implementing data validation systems, ensuring quality, and integrating large-scale datasets from multiple sources. They create real-time data workflows, handle vector databases like FAISS or Milvus, and optimize performance of AI data infrastructure. Using tools like Python, SQL, Apache Spark and Airflow, they collaborate with data scientists and ML engineers to transform raw data into formats that support model training and deployment."},{"question":"What skills are required for AI Data Engineer jobs?","answer":"Strong programming skills in Python and SQL form the foundation for AI Data Engineer roles. Proficiency with data engineering frameworks like Apache Spark, Airflow, and Ray is essential for building robust pipelines. Experience with cloud platforms (AWS, GCP, Azure) and vector databases enables handling of AI-specific data needs. Skills in data quality assurance, monitoring, and error handling ensure reliable AI systems. Engineers should understand embedding techniques for unstructured data processing and have experience with ETL processes at scale. Soft skills like cross-functional collaboration are valuable as these roles bridge technical teams with AI scientists and business stakeholders."},{"question":"What qualifications are needed for AI Data Engineer jobs?","answer":"Most AI Data Engineer positions require a bachelor's degree in computer science, data engineering, or related technical fields, with many employers preferring master's degrees for senior roles. Hands-on experience building data pipelines for machine learning applications is crucial. Employers look for demonstrated expertise with cloud data services like Redshift, BigQuery or Snowflake, and familiarity with MLOps practices. Knowledge of data preprocessing techniques for unstructured data (text, images, videos) sets successful candidates apart. Professional certifications in cloud platforms or data technologies can strengthen qualifications, especially when combined with proven experience integrating large-scale datasets for AI workflows."},{"question":"What is the salary range for AI Data Engineer jobs?","answer":"Compensation for AI Data Engineers varies based on several key factors. Location significantly impacts pay, with tech hubs like San Francisco and New York offering higher salaries than smaller markets. Experience level creates substantial differences, with senior engineers commanding significantly more than entry-level positions. Specialized skills in emerging AI tools, vector databases, and specific cloud platforms can increase earning potential. Company size also matters—large tech companies and well-funded AI startups often pay premium rates. The specialized nature of preparing data for AI applications typically positions these roles at higher compensation levels than traditional data engineering positions with similar years of experience."},{"question":"How long does it take to get hired as an AI Data Engineer?","answer":"The hiring timeline for AI Data Engineers typically spans 4-8 weeks from application to offer. The process usually includes an initial resume screening, followed by a technical phone interview covering Python, SQL, and data pipeline concepts. Candidates then face 1-3 rounds of technical interviews focusing on data engineering problems, system design for AI workflows, and coding exercises. Some companies add take-home assignments demonstrating pipeline building for AI data. Final rounds often include discussions with potential team members and hiring managers. Specialized skills in AI data preprocessing and experience with vector databases can accelerate the process, especially for candidates with proven experience in similar roles."},{"question":"Are AI Data Engineer jobs in demand?","answer":"AI Data Engineer positions show strong demand as organizations build infrastructure for AI initiatives. This specialized role bridges traditional data engineering and AI needs, with job postings appearing at major institutions like Stanford and companies like OpenAI. The role is gaining recognition as essential for AI implementation success, particularly as companies scale their machine learning operations. Demand stems from the unique requirements of AI data pipelines, which differ significantly from traditional analytics infrastructure. Organizations need engineers who understand the specific data preprocessing needs of machine learning models and can build robust pipelines for handling diverse data types including text, images, and videos."},{"question":"What is the difference between AI Data Engineer and Data Engineer?","answer":"While both roles build data pipelines, AI Data Engineers specifically focus on preparing data for machine learning and AI systems rather than business analytics. They work extensively with unstructured data (text, images, videos), implementing specialized preprocessing techniques that traditional Data Engineers rarely handle. AI Data Engineers commonly use vector databases like FAISS and embedding libraries that aren't typical in standard data engineering. They must understand model training data requirements and build infrastructure supporting model deployment. Traditional Data Engineers concentrate on structured data flows, data warehousing, and analytics support, while AI Data Engineers create pipelines optimized for machine learning with features like data versioning, lineage tracking, and real-time AI-ready data delivery."}]