Back to jobsJob overview

About the role

Member of Technical Staff - Reinforcement Learning (Infrastructure), AGI Autonomy at Amazon.com Services LLC

Required Skills

pythonjavac++reinforcement learningllmsdistributed systemsmlsysgpus

About the Role

This role involves designing, building, and maintaining infrastructure for training and evaluating state-of-the-art AI agent models, focusing on large-scale reinforcement learning for LLMs. You will work closely with research teams to ensure efficient and robust ML systems, troubleshooting performance bottlenecks and conducting MLSys research.

Key Responsibilities

  • Develop training infrastructure for efficient large-scale reinforcement learning on LLMs
  • Work across the entire technology stack including low-level ML systems, job orchestration, and data management
  • Analyze, troubleshoot, and profile complex ML systems to identify and address performance bottlenecks
  • Work closely with researchers to conduct MLSys research and create new techniques and tooling

Required Skills & Qualifications

Must Have:

  • PhD or Master's degree and 3+ years of applied research experience
  • Experience with programming languages such as Python, Java, C++
  • Experience with neural deep learning methods and machine learning
  • Experience with training and deploying ML systems for large-scale optimizations or troubleshooting technical systems

Nice to Have:

  • PhD or Master's degree with experience in various ML techniques and performance parameters
  • Experience with large-scale ML systems, profiling, debugging, and understanding system performance and scalability
  • Experience with distributed systems, Megatron, vLLM, Ray, and working with GPUs
  • Experience with patents or publications at top-tier peer-reviewed conferences or journals

Benefits & Perks

  • Base pay ranging from $255,000 to $345,000/year depending on location
  • Equity, sign-on payments, and other forms of compensation may be provided
  • Full range of medical, financial, and/or other benefits