AI Research Resident
Lead research that advances Maincode's work on capable, useful, and trustworthy AI systems. Design and execute experiments, develop new research directions, and collaborate closely with researchers and engineers. Produce research outputs suitable for top-tier conferences, journals, technical reports, open-source releases, or deployment in Matilda and future Maincode systems.
Researcher: Agent Post-Training, API & Power-Users
The role involves improving the capabilities, reliability, and product fit of OpenAI’s agentic models for power users and API developers. Responsibilities include designing and running experiments to enhance model behavior in API and power-user workflows such as function calling, tool use, coding, planning, and long-horizon execution. The role requires building evals, graders, and environments from real developer and power-user workflows, turning observed failures into training data, hypotheses, and improvements. The researcher partners with API and power-users to identify behavior gaps and translate product signals into post-training interventions. They improve model behavior when composed into systems, ensuring reliable tool use, respect for developer intent, appropriate error handling, clarification when needed, and task coherence. The role also includes owning end-to-end model behavior projects from failure analysis through training, eval design, integration into major model runs, and launch readiness. Developing feedback loops using power-user traces and production-like environments to identify model failures and gaps is part of the job. The researcher assists in deciding which capabilities, fixes, and integrations are ready for major model runs. Additionally, debugging hard failures in models by analyzing traces, evals, training data, and product context is required. The role involves working on early-training and alignment interventions, improving large-scale training and launch machinery, and taking on cross-functional projects that touch model training, product infrastructure, and production agent harnesses, including multi-agent systems and training against production-like environments.
Researcher, Training - London
Design, prototype and scale up new architectures to improve model intelligence; execute and analyze experiments autonomously and collaboratively; study, debug, and optimize both model performance and computational performance; contribute to training and inference infrastructure.
Applied AI Researcher, Multi-Agent Systems
The Multi-Agent Systems team focuses on designing architectures in which multiple agents coordinate to solve problems that require structured interaction across multiple reasoning processes. Researchers build systems that structure communication, route information, and coordinate decision-making across agents operating with different views of the problem. Researchers investigate the interaction patterns that govern how agents collaborate, studying how agents exchange information, critique and refine each other’s reasoning, and coordinate execution across complex workflows. Their work identifies the mechanics behind effective communication, delegation, and coordination, establishing the design language for how systems of agents can operate as cohesive, high-performing teams, with capabilities that arise from interaction rather than individual performance.
Principal Applied AI Researcher - Domain- Specific Models (Brazil)
The Principal Applied AI Researcher is responsible for setting the company-level technical direction for domain-specific model strategy, defining how models are built, evaluated, scaled, and sustained across continued pre-training, fine-tuning, post-training, and release quality standards. They architect the agentic model development paradigm by designing research infrastructure such as experiment orchestration, data pipeline automation, continuous evaluation, and competitive benchmarking to enhance research productivity. They lead deep research on model adaptation methodology, data curation, post-training methods, and training dynamics using agentic systems for parallel experiments and failure analysis. They also shape model strategy across all company domains by prioritizing new model domains and using agent-driven competitive intelligence and market analysis. The role includes defining evaluation strategies involving benchmark design, expert assessment, model failure analysis, robustness standards, and building continuous evaluation systems that inform real-time investment decisions. They lead cross-cutting research initiatives to advance data perception, retrieval, post-training, and runtime orchestration, ensuring these advancements compound across the platform. The researcher influences platform-level decisions such as model lifecycle management, portfolio strategy, release criteria, and integration architecture to support human and agentic system co-evolution. Additionally, they mentor senior researchers to enhance experimental rigor and technical judgment, participate in hiring, and maintain hands-on research impact through technical work, publications, patents, and visible output.
Applied AI Researcher (Brazil)
Architect and orchestrate massively parallel AI research workflows by designing experiments that utilize fleets of agentic AI systems to explore hypothesis spaces, hyperparameter landscapes, and architectural variations at large scale and speed. Design, train, and iterate on models across the full GenAI stack including LLMs, VLMs, embedding models, rerankers, and reward models using agentic pipelines that autonomously manage data preprocessing, training runs, evaluation sweeps, and result synthesis. Conduct rigorous, first-principles research into model architectures, training dynamics, reinforcement learning, and knowledge representation, using AI agents to accelerate literature reviews, ablation studies, and mathematical analysis. Work across disciplines and modalities such as NLP, computer vision, multimodal understanding, agentic reasoning, and domain science by delegating exploration, prototyping, and benchmarking to parallel agent systems to synthesize insights across fields simultaneously. Build and contribute to shared tooling, libraries, and platforms for orchestrating autonomous experiment pipelines, data processing workflows, and evaluation harnesses at scale. Collaborate with engineering, product, and domain experts to rapidly integrate research breakthroughs into production platforms using agentic CI/CD and automated integration testing to compress the research-to-deployment cycle. Document findings, publish at top-tier venues, and develop internal knowledge systems that agentic tools can index and reason over to amplify collective intelligence. Identify and address workflow bottlenecks for oneself and the team by designing or adopting efficient, scalable solutions, treating personal augmentation as a core research output.
Applied AI Researcher (Dublin, CA)
Architect and orchestrate massively parallel AI research workflows by designing experiments leveraging fleets of agentic AI systems to explore hypotheses, hyperparameters, and architectural variations at scale. Design, train, and iterate on models across the GenAI stack, including LLMs, VLMs, embedding models, rerankers, and reward models, using autonomous agentic pipelines for data preprocessing, training, evaluation, and result synthesis. Conduct rigorous research into model architectures, training dynamics, reinforcement learning, and knowledge representation, accelerated by AI agents for literature review, ablation studies, and mathematical analysis. Span disciplines and modalities such as NLP, computer vision, multimodal understanding, agentic reasoning, and domain science through delegation to parallel agent systems. Develop and contribute to shared tooling, libraries, and platforms to enable researchers to orchestrate autonomous experiments, data workflows, and evaluation at scale. Collaborate with engineering, product, and domain experts to integrate research breakthroughs into production rapidly using agentic CI/CD and automated testing. Document findings, publish at top-tier venues, and build internal knowledge systems for indexing and reasoning by agentic tools to amplify collective intelligence. Identify and resolve workflow bottlenecks by designing or adopting scalable solutions to augment human potential.
Researcher, Alignment Science
As a Research Engineer / Research Scientist on the Alignment team, you will design and implement alignment experiments focused on intent following, honesty, calibration, and robustness. You will train and evaluate models using reinforcement learning and other empirical machine learning methods. Your role includes developing evaluations for failure modes such as hallucination, instruction-following failures, reward hacking, covert actions, and scheming. You will study methods that encourage models to verify their behavior and report shortcomings honestly, including confession-style training objectives. You will build monitoring and inference-time interventions that ensure compliant behavior or surface model issues to users or downstream systems. Additionally, you will investigate how alignment methods scale with model capability, compute, data, context length, action length, and adversarial pressure. You will integrate successful techniques into model training and deployment workflows, produce externally publishable research when results advance the broader science of alignment, and collaborate with researchers and engineers across post-training, reinforcement learning, evaluations, safety, and product-facing teams.
Machine Learning Research, RF Foundation Models Specialist
Formulate new machine learning problems in RF sensing and spectrum understanding. Design experiments and evaluation approaches reflecting real operating conditions such as domain shift, changing interference, and varying sensors and platforms. Build models for structured, noisy, and partially observed signal environments. Improve robustness across propagation, interference, and low-visibility waveform conditions. Optimize models for throughput, latency, and deployment constraints. Move promising research into a release path for real systems through proofs-of-concept, realistic validation, and conversion into maintainable, deployable code. Use field performance to inform the development of the next generation of models and tooling. Work across the lifecycle of research and deployment including data and evaluation design, experimentation, model development, release readiness, and iteration based on real-world outcomes. Collaborate closely with embedded, hardware, and mission teammates, influencing how machine learning capability is built as the company scales.
Researcher, Agentic Post-Training
Own end-to-end research and engineering projects to improve the final post-training of OpenAI’s agentic models. Decide which integrations are ready for inclusion in major model runs in collaboration with partner teams. Develop horizontal model improvements in areas such as factuality, instruction following, tool/function calling, multi-agent behavior, and reasoning-effort calibration. Build and improve training, evaluation, grading, and data infrastructure for large-scale reinforcement learning/post-training runs. Create evaluations and diagnostics to assess model readiness for deployment. Enhance feedback loops from real product usage into post-training, including learning from implicit user feedback. Collaborate with Codex, API, ChatGPT, product, training, and other post-training teams to make frontier models more useful, reliable, and agentic.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
