Back to jobsJob overview

About the role

Cloud Network Engineer at Microsoft

Required Skills

networkinginfinibandethernetlinuxbgpmplsvxlan/evpntelemetryai/hpc

About the Role

Cloud Network Engineer role focused on designing and operating high-performance networking systems for AI/HPC clusters. Responsibilities include deploying low-latency network topologies, monitoring network health, and collaborating across hardware and software teams. Requires experience with data center networks and distributed computing platforms.

Key Responsibilities

  • Support deployment of high-throughput, low-latency network topologies (e.g., Clos, FatTree) using InfiniBand and Ethernet
  • Monitor network health, respond to incidents, perform root-cause analysis, and improve availability and observability
  • Collaborate with hardware engineering, data center operations, and software-defined networking teams
  • Maintain documentation for network designs, cabling standards, and deployment procedures
  • Stay informed about advancements in optical networking, high-speed interconnects, and AI/HPC fabric technologies

Required Skills & Qualifications

Must Have:

  • Bachelor's Degree in Electrical Engineering, Optical Engineering, Computer Science, Engineering, Information Technology, or related field OR equivalent experience
  • Experience designing, deploying, and supporting data center and backbone networks for distributed computing platforms
  • Ability to pass Microsoft Cloud Background Check upon hire/transfer and every two years thereafter

Nice to Have:

  • Master's Degree in Electrical Engineering, Optical Engineering, Computer Science, Information Technology, or related field OR Bachelor's Degree with 2+ years technical experience in network design, development, and automation
  • Proficient understanding of Routing Protocols including BGP, MPLS and tunneling techniques including VxLAN/EVPN
  • Experience with telemetry and observability tools for monitoring physical network health, link performance, and congestion at scale
  • Background in building scalable, fault-tolerant physical networks for distributed computing environments (e.g., AI/ML clusters, HPC systems)
  • Proficiency in Linux-based systems, including kernel-level networking, interface tuning, and low-level debugging of physical network issues

Benefits & Perks

  • Industry leading healthcare