AI Evaluation Engineer
Design and implement evaluation pipelines to measure the performance and reliability of AI models, develop automated testing frameworks to assess model outputs at scale, analyze model performance using both traditional statistical metrics and AI-specific evaluation methods, evaluate AI systems built on modern architectures such as LLM-based applications and Retrieval-Augmented Generation (RAG), identify potential issues related to accuracy, hallucinations, bias, safety, and model drift, conduct adversarial testing to uncover vulnerabilities and ensure safe model behavior, collaborate with engineering and AI teams to improve prompt design, model outputs, and system performance, monitor model performance in production, and help define best practices for AI evaluation and observability.
Sr. Engineering Manager, Machine Learning
Lead a talented team of engineers focused on improving AKASA’s machine learning capabilities and delivering cutting-edge products. Supervise and directly contribute across all parts of the LLM stack, including model fine-tuning, inference, evaluation, and deployment. Develop infrastructure and tooling to improve the model development lifecycle. Oversee a high-performing team via hands-on contributions and coaching. Translate business requirements into technical designs that work within constraints such as latency, cost, performance, and uptime. Set the vision and direction for the team and attract top talent to join AKASA. Attend in-office co-working days every Wednesday as part of the local R&D team.
AceUp - Lead ML Engineer (Generative AI & LLM Focus)
Architect conversational agents that are stateful, context-aware, and capable of maintaining long-running coherent dialogues to handle complex reasoning tasks. Build retrieval-augmented generation (RAG) pipelines that ground large language model (LLM) responses in proprietary data to ensure high accuracy and minimize hallucinations. Lead the development of natural language processing (NLP) pipelines to extract structured insights from varied unstructured data sources, initially text and eventually audio. Implement advanced personalization layers that adapt model behavior and tone dynamically based on user history and context. Own the deployment lifecycle of LLM models including prompt architecture, evaluation frameworks, latency optimization, and cost management on Vertex AI. Provide technical mentorship by reviewing code, setting architectural standards, and guiding technical decision-making for ML engineers without people management responsibilities.
Intern of Technical Staff - Sovereign AI
As a Sovereign AI Intern, you will design, train and improve upon cutting-edge models to serve public interest, help develop new techniques to train and serve models safer, better, and faster, train extremely large-scale models on massive datasets, learn from experienced senior machine learning technical staff, and work closely with product teams to develop solutions.
Lead Machine Learning Engineer
The Lead Machine Learning Engineer will own the development and improvement of the system predicting the next action salespeople should take to advance their relationships. Responsibilities include selecting the best model architecture and approach, involving a mixture of LLM steps and traditional ML models, picking evaluation metrics, designing systems to analyze models in production to identify areas for improvement, and identifying when to use the human data team for training or validation datasets. The engineer will read relevant research to find the best approach for their use case and, in partnership with the CTO, define how machine learning works with product engineering, model operations, and human data teams and how the team should develop moving forward.
Senior/Staff Machine Learning Engineer - Perception Offline Driving Intelligence
As an engineer in the Offline Driving Intelligence (ODIN) team at Zoox, the responsibilities include developing advanced multimodal large language models to enhance environmental understanding for robotaxis, designing model architectures and training techniques using sensor inputs and large scale data, driving end-to-end machine learning solutions from research to production using Zoox's data pipelines and infrastructure, collaborating with perception, planning, safety, and systems teams to integrate models into the vehicle's decision-making pipeline, and validating and optimizing solutions using real-world driving scenarios to contribute directly to the safety and reliability of Zoox's autonomous system.
Senior Machine Learning Engineer
The Senior Machine Learning Engineer will research, evaluate, and implement state-of-the-art NLP methodologies and large language model approaches to drive product innovation and develop new functionalities. They will design, develop, and deploy LLM agents and multi-agent systems to automate complex legal workflows and enhance user experiences. The role involves collaborating on projects that leverage emerging technologies such as Retrieval-Augmented Generation (RAG) and Knowledge Graphs to enhance the core product and explore new use cases. The engineer will work closely with cross-functional teams to integrate advanced ML models and NLP solutions into the platform, ensuring alignment with business objectives and tangible value. Additionally, they will stay current with the latest trends and breakthroughs in NLP, machine learning, and multi-agent systems, contributing ideas to shape the strategic direction of AI initiatives.
Senior Machine Learning Engineer
Design and ship advanced ML systems, especially LLM-driven agents and self-improving workflows. Build robust data and training pipelines, enable fast experimentation, and ensure models and agents continuously improve in production. Build LLM-based agents, tool-using workflows, and autonomous self-improvement loops. Design, train, and evaluate ML models across NLP/LLM, vision, and retrieval domains. Develop data pipelines, training code, experiment tooling, and automated deployment systems. Use PyTorch for model development and W&B (or similar) for tracking experiments and lineage. Implement monitoring for performance, drift, safety, and agent behavior. Optimize inference for latency, throughput, and cost. Work closely with engineering and product teams to turn prototypes into reliable production features. Establish ML engineering best practices and mentor teammates.
Sr. Machine Learning Researcher
As a Senior Machine Learning Researcher at AKASA, you will lead the design, training, and evaluation of large language models to address healthcare-specific challenges, focusing on advancing clinical Natural Language Understanding. You will collaborate with cross-functional teams including PhD researchers, ML engineers, and healthcare experts to integrate Human-in-the-Loop data for model improvements and explore optimization methods. Your role includes working end-to-end on model design, data creation, training, evaluation, and iteration to ensure research advances both models and real-world healthcare tasks. You will stay updated on machine learning advancements to maintain AKASA's leadership in healthcare AI, partner with healthcare experts to align models with real-world needs, contribute to high-impact publications, and support the integration of your research into AKASA's product offerings used across healthcare systems.
Tech Lead, LLM & Generative AI (Full Remote - Ukraine)
The Tech Lead is responsible for architecting the system and mentoring a team of three engineers while spending significant time hands-on in the codebase using Python and PyTorch. They will own the core chat loop, optimizing context windows, memory/RAG retrieval, and inference latency to ensure a seamless real-time experience. They must drive the strategy for supervised fine-tuning (SFT), reinforcement learning with human feedback (RLHF/DPO), deciding when to prompt, fine-tune, or architect new retrieval augmented generation (RAG) pipelines. They manage the "Data Engine" overseeing sourcing, labeling, and cleaning datasets to improve model steerability and multicultural performance. Additionally, they design and train custom classifiers for high-precision moderation to detect and filter non-consensual or illegal content, moving beyond binary safe/unsafe flags to enable nuanced, context-aware moderation systems within an uncensored, NSFW environment.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
