Back to jobsJob overview

About the role

Senior Software Engineer at Microsoft

Required Skills

pythonc++distributed systemsnetworkingai/mllinuxperformance tuningobservabilityinfrastructure

About the Role

Senior Software Engineer role focused on building tooling and infrastructure for next-generation AI supercomputing and high-performance networking. Responsibilities include developing network automation, observability frameworks, and performance optimization systems for distributed AI workloads. The role involves working with high-speed fabrics and accelerated compute platforms to ensure reliability and performance at exascale levels.

Key Responsibilities

  • Design and build software tools for high-performance networking in AI/HPC systems
  • Develop automation and observability tooling for petabyte-scale data movement
  • Analyze performance metrics to identify bottlenecks and improve throughput
  • Debug complex networking and system-level issues across large clusters
  • Own design and documentation of new software systems and lead architectural reviews

Required Skills & Qualifications

Must Have:

  • Bachelor's Degree in Computer Science or related field AND 4+ years technical engineering experience with coding in languages like C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 3+ years of experience developing tools for distributed computing environments (e.g., HPC, AI/ML clusters, cloud-scale platforms)
  • 1+ years of familiarity with network performance tuning, telemetry, and observability tools in high-throughput, low-latency environments
  • 1+ years of exposure to network virtualization, software-defined networking (SDN), or fabric orchestration solutions

Nice to Have:

  • Hands-on experience with networking technologies in AI-specific hardware (e.g., InfiniBand, ROCE, NVLink)
  • Familiarity with AI accelerators such as GPUs (NVIDIA, AMD) or TPUs and their interaction with networking infrastructure
  • Background in building scalable and fault-tolerant systems in large, distributed environments and proficiency in Linux operating systems, including kernel-level networking, performance tuning, and debugging

Benefits & Perks

  • Industry leading healthcare