Back to jobsJob overview
About the role
Principal Software Engineer at Microsoft
Required Skills
pythonc++azurehpcai systemstelemetrydata pipelinescloud infrastructuregpu systems
About the Role
Principal Software Engineer designing and developing high-volume low-latency telemetry pipelines for Azure's flagship supercomputers used by top AI customers. The role involves managing large-scale HPC & GPU systems, connecting to existing telemetry pipelines, and delivering insights on customer-facing issues across the infrastructure stack.Key Responsibilities
- Architect, design and develop high volume low latency end to end event pipelines
- Conduct analysis of existing event pipelines to evaluate fidelity, granularity and latency
- Contribute to improving key metrics such as Job Mean Time to Interrupt and Mean Time to Resolve
- Partner with cross organizational teams to evaluate available telemetry and drive architecture solutions
- Drive engineering and operational excellence based on issues and learnings from strategic customers
Required Skills & Qualifications
Must Have:
- Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience
- 5+ years hands on experience designing and developing high volume low latency pipelines
- 3+ years of experience with AI/HPC system management OR High-Speed Networks OR HPC Storage OR managing Cloud Infrastructure
- Ability to meet Microsoft security screening requirements including Microsoft Cloud Background Check
Nice to Have:
- Bachelor's Degree in Computer Science AND 10+ years technical engineering experience OR Master's Degree AND 8+ years experience
- 5+ years of experience in operating AI/HPC systems, developing and running AI/HPC applications on clusters, or operating Cloud Infrastructure
- 3+ years of experience in multiple DataCenter technologies: power, cooling, IT hardware, telemetry
Benefits & Perks
- Industry leading healthcare