DevOps Engineer, Infrastructure & Security
The role involves taking full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies. Responsibilities include overseeing the end-to-end health of the platform to ensure seamless integration between the AI core and all full-stack components, from APIs to UI, maintaining a responsive and production-ready environment. The job also requires building automated systems to monitor model performance and data drift across geographically dispersed environments, managing the technical lifecycle within diverse regulatory frameworks, leading the response for production issues in mission-critical environments, ensuring rapid resolution and prevention of future issues. Additionally, the role requires translating deep technical performance metrics into clear insights for senior international government officials and partnering with Engineering and ML teams to ensure lessons learned in the field influence the technical architecture and decisions of future use cases.
Senior Pathologist
Lead the team responsible for the infrastructure supporting AI/ML Stack, focusing on scalability and efficiency of the Machine Learning Operations platform. Develop and execute the long-term vision and roadmap for the MLOps team to support ML development and deployment across business units, balancing short-term tactical deliveries with long-term architectural transformation. Manage and mentor a team of 6-7+ engineers, allocating resources strategically to support existing services and execute key strategic initiatives. Collaborate cross-functionally with leaders in machine learning, data science, product engineering, and infrastructure to identify pain points, remove bottlenecks, and facilitate new solution deployment. Architect compute and storage pipelines for ML Engineers to manage large datasets and artifacts efficiently. Modernize the AI product inference stack for significant growth in global deployments. Work with Site Reliability Engineering to establish comprehensive system observability metrics. Conduct assessments for technology refresh and benchmark proprietary tools against commercial and open-source alternatives to meet future needs.
Infrastructure Engineer
The Infrastructure Engineer is responsible for building robust, secure, and scalable cloud infrastructure to support AI and machine learning workflows. This includes designing, building, and deploying cloud infrastructure, partnering with technical and non-technical stakeholders from idea generation through implementation and shipping, enabling Machine Learning Engineers and Data Scientists by contributing to internal best practices, standards, and reusable code repositories, proactively identifying and recommending ways customers can leverage cloud infrastructure to solve key challenges, creating and maintaining reusable, company-wide libraries and infrastructure-as-code, and researching and integrating the best open-source technologies to enhance Faculty's infrastructure capabilities.
Staff DevOps Engineer
The Staff DevOps Engineer will design and architect secure, scalable cloud and edge infrastructure for deploying AI workloads across multi-cloud and hybrid environments. They will build and maintain production-grade Infrastructure as Code using tools like Terraform, Ansible, or Pulumi, managing over 100 resources with GitOps workflows and automated validation. The role includes designing and operating production Kubernetes clusters optimized for AI/ML workloads with GPU support, implementing container security, multi-tenancy, and resource optimization. They will implement secure CI/CD pipelines with integrated security controls and automated deployment workflows for containerized AI models. The engineer will lead MLOps infrastructure initiatives including model deployment pipelines, versioning, feature stores, experiment tracking, and monitoring for model performance and drift. Responsibilities also include designing comprehensive observability and monitoring solutions using tools like Prometheus, Grafana, ELK, or Datadog with distributed tracing, application performance monitoring, and real-time alerting. They will implement security best practices such as least-privilege access, encryption at rest and in transit, network segmentation, and automated compliance validation. The engineer will lead incident response and reliability initiatives, participate in on-call rotation, conduct post-mortems, and drive continuous improvement for system reliability. Architecting disaster recovery and business continuity strategies with automated backup, failover, and recovery processes is required. They will develop reusable infrastructure modules and templates to accelerate environment provisioning and standardize deployment patterns. Mentoring mid-level and senior engineers on cloud architecture, DevOps best practices, and platform reliability through design reviews and technical guidance is part of the role. They will also drive technical documentation and knowledge sharing including runbooks, architecture decision records, and infrastructure standards.
Site Reliability Engineer, Inference Infrastructure
As a Site Reliability Engineer on the Model Serving team, you will build self-service systems that automate managing, deploying, and operating services, including custom Kubernetes operators supporting language model deployments. You will automate environment observability and resilience, enabling all developers to troubleshoot and resolve problems, and take steps to ensure defined SLOs are met, including participating in an on-call rotation. Additionally, you will build strong relationships with internal developers and influence the Infrastructure team’s roadmap based on their feedback, as well as develop the team through knowledge sharing and an active review process.
DevOps Engineer
The DevSecOps / Platform Engineer will design, implement, and operate secure, cloud-native infrastructure powering core data and application platforms for a defense-focused company. They will develop CI/CD pipelines, automate deployments, uphold security practices, and collaborate across teams to ensure reliability, scalability, and compliance for government users.
Staff Software Engineer, Infrastructure
You will design, build, and operate production infrastructure for high-scale, low-latency systems, owning critical services end-to-end to improve reliability and performance. The role also involves partnering with research and product teams, optimizing service latencies, evolving CI/CD and self-service tooling, and leading infrastructure-as-code and GitOps practices.
Staff Infrastructure Security Engineer
The engineer will architect, deploy, and operationalize foundational security services to support Crusoe's move toward Zero Trust, serving as a technical leader for secrets management and identity architecture. Responsibilities span from driving enterprise-wide platforms like HashiCorp Vault to defining trust patterns and secure onboarding in a hybrid, multi-cloud environment.
Enterprise Security Engineer
You will be responsible for building and operationalizing the company's compliance program, implementing controls, and supporting audits in a fast-paced SaaS environment. Key tasks include managing GRC tools, automating workflows for compliance standards such as SOC 2 and ISO 27001, and supporting responses to customer security assessments.
Freelance AI Red Team Engineer
As a Freelance AI Red Team Engineer, you will evaluate and red team AI models, agents, and machine learning systems for safety risks and vulnerabilities. You will also develop automation tools, create rigorous test scenarios, and contribute to security research initiatives in the AI domain.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
