Back to jobsJob overview
About the role
Senior Software Engineer at Microsoft
Required Skills
pythonc++distributed systemsnetworkingai/mllinuxperformance tuningobservabilityinfrastructure
About the Role
Senior Software Engineer role focused on building tooling and infrastructure for next-generation AI supercomputing and high-performance networking. Responsibilities include developing network automation, observability frameworks, and performance optimization systems for distributed AI workloads. The role involves working with high-speed fabrics and accelerated compute platforms to ensure reliability and performance at exascale levels.Key Responsibilities
- Design and build software tools for high-performance networking in AI/HPC systems
- Develop automation and observability tooling for petabyte-scale data movement
- Analyze performance metrics to identify bottlenecks and improve throughput
- Debug complex networking and system-level issues across large clusters
- Own design and documentation of new software systems and lead architectural reviews
Required Skills & Qualifications
Must Have:
- Bachelor's Degree in Computer Science or related field AND 4+ years technical engineering experience with coding in languages like C, C++, C#, Java, JavaScript, or Python OR equivalent experience
- 3+ years of experience developing tools for distributed computing environments (e.g., HPC, AI/ML clusters, cloud-scale platforms)
- 1+ years of familiarity with network performance tuning, telemetry, and observability tools in high-throughput, low-latency environments
- 1+ years of exposure to network virtualization, software-defined networking (SDN), or fabric orchestration solutions
Nice to Have:
- Hands-on experience with networking technologies in AI-specific hardware (e.g., InfiniBand, ROCE, NVLink)
- Familiarity with AI accelerators such as GPUs (NVIDIA, AMD) or TPUs and their interaction with networking infrastructure
- Background in building scalable and fault-tolerant systems in large, distributed environments and proficiency in Linux operating systems, including kernel-level networking, performance tuning, and debugging
Benefits & Perks
- Industry leading healthcare