Back to jobsJob overview
About the role
SDE- ML Engineer, Frontier AI Robotics at Amazon.com Services LLC
Required Skills
pythonpytorchc++machine learningdeep learningllmdistributed systemsgpu optimizationlinear algebra
About the Role
This role involves building and optimizing distributed training infrastructure for large-scale machine learning models, particularly deep learning and transformer architectures. The engineer will collaborate with scientists and engineers to deliver scalable, high-performance systems for state-of-the-art AI research and applications in robotics.Key Responsibilities
- Design, build, and optimize machine learning infrastructure for large-scale training and inference
- Apply PyTorch, Python, and C++ skills to engineer modular, scalable ML systems
- Evaluate and implement parallelism techniques such as data, tensor, model, and pipeline parallelism
- Monitor and optimize GPU memory and throughput for training large models efficiently
- Collaborate cross-functionally with research and data infra teams to integrate new models and features
Required Skills & Qualifications
Must Have:
- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture experience for new and existing systems
- Experience programming with at least one software programming language
- Deep understanding of LLM algorithms and deep learning frameworks like PyTorch
- Strong understanding of linear algebra, calculus, probability, and statistics
Nice to Have:
- 3+ years of full software development life cycle experience including coding standards, code reviews, source control, build processes, testing, and operations
- Bachelor's degree in computer science or equivalent
Benefits & Perks
- Full range of medical, financial, and/or other benefits
- Equity, sign-on payments, and other forms of compensation may be provided
- Inclusive culture with workplace accommodations for disabilities