Back to jobsJob overview

About the role

Principal AI Network Architect at Microsoft

Required Skills

ai network architecturegpu systemsrdma protocolsoptical interconnectssignal integrityhigh-radix switchesai training workloadshyperscale deployments

About the Role

Microsoft seeks a Principal AI Network Architect to design ultra-high bandwidth, low-latency backend networks for next-generation GPU and AI accelerator platforms. The role involves driving system-level integration for scalable AI training workloads and collaborating with cross-functional teams and industry partners to shape hyperscale AI infrastructure.

Key Responsibilities

  • Spearhead architectural definition and innovation for next-generation GPU and AI accelerator platforms
  • Drive system-level integration across compute, storage, and interconnect domains
  • Partner with silicon, firmware, and datacenter engineering teams to co-design infrastructure
  • Cultivate deep technical relationships with silicon vendors, optics suppliers, and switch fabric providers
  • Evaluate and articulate tradeoffs across electrical, mechanical, thermal, and signal integrity domains

Required Skills & Qualifications

Must Have:

  • Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 8+ years technical engineering experience OR Master's Degree AND 7+ years OR equivalent experience
  • 5+ years of experience in designing AI backend networks and integrating them into large-scale GPU systems
  • Ability to meet Microsoft, customer and/or government security screening requirements including Microsoft Cloud Background Check

Nice to Have:

  • Proven expertise in system architecture across compute, networking, and accelerator domains
  • Deep understanding of RDMA protocols (RoCE, InfiniBand), congestion control (DCQCN), and Layer 2/3 routing
  • Experience with optical interconnects (e.g., PSM, WDM), link budget analysis, and transceiver integration
  • Familiarity with signal integrity modeling, link training, and physical layer optimization
  • Experience architecting backend networks for AI training and Inference workloads, including Hamiltonian cycle traffic and collective operations

Benefits & Perks

  • Industry leading healthcare