Back to jobsJob overview

About the role

Software Engineer- AI/ML, AWS Neuron at Annapurna Labs (U.S.) Inc.

Required Skills

pythonpytorchjaxdistributed trainingllmsxlaawsmachine learningneuron

About the Role

This role is for a Software Engineer in the Machine Learning Applications team for AWS Neuron, responsible for developing, enabling, and performance tuning of various ML model families including large language models. The engineer will work on distributed training solutions using Trainium and help integrate support into PyTorch and Jax.

Key Responsibilities

  • Development, enablement and performance tuning of ML model families including LLMs, stable diffusion, Vision Transformers
  • Building distributed training support into PyTorch and Jax using XLA and Neuron compiler/runtime stacks
  • Tuning models to ensure highest performance and maximize efficiency on AWS Trainium
  • Working with chip architects, compiler engineers and runtime engineers on distributed training solutions
  • Extending distributed training libraries (FSDP, Deepspeed) for Neuron-based systems

Required Skills & Qualifications

Must Have:

  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture experience
  • Experience programming with at least one software programming language
  • Experience training large ML models using Python

Nice to Have:

  • 3+ years of full software development life cycle experience
  • Bachelor's degree in computer science or equivalent

Benefits & Perks

  • Inclusive team culture with employee-led affinity groups
  • Work-life balance with flexible working hours
  • Mentorship and career growth opportunities
  • Comprehensive compensation package with medical, financial benefits
  • Geographic-based compensation ranging from $129,300 to $223,600