Back to jobsJob overview
About the role
Site Reliability Engineer II at Microsoft
Required Skills
pythonpowershelldistributed systemsautomationmonitoringai/mlcloud servicesdebugging
About the Role
Site Reliability Engineer II role focused on ensuring high availability and reliability of Microsoft 365 Exchange Online services. Responsibilities include implementing proactive engineering solutions, monitoring systems, automating incident response, and integrating AI/ML for predictive analytics. The position requires strong debugging skills, experience with distributed systems, and collaboration with product engineering teams.Key Responsibilities
- Implement proactive engineering solutions to identify and resolve incidents with limited disruptions
- Develop automation code and scripts for monitoring, alerting, and deployment processes at scale
- Analyze telemetry data and develop predictive models to improve product reliability and performance
- Respond to incidents during on-call rotations by troubleshooting complex issues and deploying fixes
- Mentor and coach less experienced engineers and collaborate with product engineering teams
Required Skills & Qualifications
Must Have:
- Bachelor's or Master's degree in Computer Science, Data Science, AI, or related field
- Mid-level years of software development experience with focus on automation
- Understanding of modern software architectures including distributed systems, microservices, and failure modes
- Strong troubleshooting skills and ability to debug complex systems and applications
Nice to Have:
- Experience with scripting languages like bash, python, or PowerShell
- Experience with compiled languages like C or C#
- Practical experience running large scale online systems
Benefits & Perks
- Industry leading healthcare