DevOps Engineer (Argentina)
Debug and fix issues in the platform and ship pull requests with fixes. Build internal tools and copilots powered by generative AI to enhance the team. Rapidly prototype proof-of-concept solutions for customer use cases. Collaborate across Engineering, Product, and Solutions teams to unblock customers and push the boundaries of AI adoption.
Senior Platform/DevOps Engineer (Kubernetes-Linux)
Translate business requirements into requirements for AI/ML models; prepare data to train and evaluate AI/ML/DL models; build AI/ML/DL models by applying state-of-the-art algorithms, especially transformers; leverage existing algorithms from academic or industrial research when applicable; test, evaluate, and benchmark AI/ML/DL models and publish the models, data sets, and evaluations; deploy models in production by containerizing them; work with customers and internal employees to refine model quality; establish continuous learning pipelines for models using online or transfer learning; build and deploy containerized applications on cloud or on-premise environments.
Senior Infrastructure Engineer
As a Senior Infrastructure Engineer at Bland, responsibilities include contributing to the design of scalable architecture by building distributed systems using Kubernetes that handle high-volume, real-time voice processing with strict latency and reliability requirements; building and supporting machine learning infrastructure including training pipelines and real-time inference serving across multiple regions; maintaining robust integrations with enterprise telephony systems, SIP trunks, and VoIP infrastructure; identifying architectural flaws and solving them; ensuring platform reliability through monitoring, alerting, and incident response systems to maintain enterprise-grade uptime; anticipating and solving scaling challenges related to exponential call volume growth; and implementing security best practices and compliance requirements for enterprise customers in regulated industries.
Lead DevOps Engineer
Lead the design, building, deployment, and optimization of enterprise-grade AI agents including voice, chat, and AI copilots. Own the full lifecycle of AI agent development including prompt engineering, workflow creation, API integration, telephony setup, and evaluation forms. Engage with clients through weekly demos, progress updates, feedback gathering, and act as the primary technical contact for deployed solutions. Configure system integrations involving APIs, data maps, authentication, and connectivity to CRM, databases, and knowledge systems. Set up telephony routing (SIP/CCaaS/PSTN), manage metadata, configure fallbacks, and troubleshoot call quality issues. Monitor agent performance and iteratively refine prompts to meet automation and containment goals. Work strategically to translate customer requirements into technical solutions, addressing challenges related to security, connectivity, and knowledge ingestion. Collaborate with product and engineering teams to support deep technical fixes and platform development while independently leading client delivery and support.
Senior Pathologist
Lead the team responsible for the infrastructure supporting AI/ML Stack, focusing on scalability and efficiency of the Machine Learning Operations platform. Develop and execute the long-term vision and roadmap for the MLOps team to support ML development and deployment across business units, balancing short-term tactical deliveries with long-term architectural transformation. Manage and mentor a team of 6-7+ engineers, allocating resources strategically to support existing services and execute key strategic initiatives. Collaborate cross-functionally with leaders in machine learning, data science, product engineering, and infrastructure to identify pain points, remove bottlenecks, and facilitate new solution deployment. Architect compute and storage pipelines for ML Engineers to manage large datasets and artifacts efficiently. Modernize the AI product inference stack for significant growth in global deployments. Work with Site Reliability Engineering to establish comprehensive system observability metrics. Conduct assessments for technology refresh and benchmark proprietary tools against commercial and open-source alternatives to meet future needs.
Engineering Manager / Tech Lead
As an AI Infrastructure Engineer at Together, the responsibilities include participating in the on-call rotation (Pagerduty) to respond to production incidents, building and running infrastructure using Ansible, Terraform, and Kubernetes to enable scaling to a massive number of concurrent users, building monitoring systems to ensure high-quality service for customers, designing and implementing operational processes such as deployments and upgrades, debugging production issues across all services and levels of the stack, identifying improvements for the product architecture from reliability, performance, and availability perspectives, and planning the growth of Together AI's infrastructure.
Together Cloud Infrastructure Engineer
As an AI Infrastructure Engineer at Together, you are responsible for keeping all user-facing services and production systems running smoothly. This involves participating in an on-call rotation to respond to production incidents, building and running infrastructure using Ansible, Terraform, and Kubernetes to enable scaling for many concurrent users, building monitoring systems to ensure high-quality service, designing and implementing operational processes such as deployments and upgrades, debugging production issues across all services and levels of the stack, identifying improvements to the product architecture regarding reliability, performance, and availability, and planning the growth of Together AI's infrastructure.
Site Reliability Engineer, Inference Infrastructure
As a Site Reliability Engineer on the Model Serving team, you will build self-service systems that automate managing, deploying, and operating services, including custom Kubernetes operators supporting language model deployments. You will automate environment observability and resilience, enabling all developers to troubleshoot and resolve problems, and take steps to ensure defined SLOs are met, including participating in an on-call rotation. Additionally, you will build strong relationships with internal developers and influence the Infrastructure team’s roadmap based on their feedback, as well as develop the team through knowledge sharing and an active review process.
DevOps Engineer
The DevSecOps / Platform Engineer will design, implement, and operate secure, cloud-native infrastructure powering core data and application platforms for a defense-focused company. They will develop CI/CD pipelines, automate deployments, uphold security practices, and collaborate across teams to ensure reliability, scalability, and compliance for government users.
Staff Software Engineer, Infrastructure
You will design, build, and operate production infrastructure for high-scale, low-latency systems, owning critical services end-to-end to improve reliability and performance. The role also involves partnering with research and product teams, optimizing service latencies, evolving CI/CD and self-service tooling, and leading infrastructure-as-code and GitOps practices.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
