Back to jobsJob overview
About the role
Senior Software Engineer at Microsoft
Required Skills
pythonc++ai/hpccloud infrastructuregpu systemshigh-speed networkscontainer technologiessystem troubleshootingsupercomputing
About the Role
Senior Supercomputing Software & Systems Engineer responsible for diagnosing and troubleshooting large-scale supercomputing systems across the infrastructure stack. Develops advanced tools and implements features to ensure system reliability and performance for AI/HPC workloads on Microsoft Azure.Key Responsibilities
- Diagnose & troubleshoot largest scale supercomputing systems across infrastructure stack
- Develop and apply advanced tools for system reliability and performance
- Identify operational gaps and implement features for cloud-native supercomputers
- Collaborate with stakeholders to determine user requirements
- Act as Designated Responsible Individual (DRI) to monitor systems and guide other engineers
Required Skills & Qualifications
Must Have:
- Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding (C, C++, C#, Java, JavaScript, Python) OR equivalent experience
- 3+ years experience operating AI/HPC systems, developing/running AI/HPC applications on clusters, or operating Cloud Infrastructure
- 2+ years specialized experience with AI/HPC system management OR High-Speed Networks OR HPC Storage OR managing Cloud Infrastructure
- Ability to pass Microsoft Cloud Background Check
Nice to Have:
- Bachelor's Degree in Computer Science AND 8+ years technical engineering experience OR Master's Degree AND 6+ years experience
- 1+ year experience running/troubleshooting machine learning workloads on GPU-based HPC systems
- 1+ year experience with cloud computing, virtualization, and container technologies
Benefits & Perks
- Industry leading healthcare