Cambridge Residency Programme: Next-Generation AI Datacentre Networking

Microsoft, Newtown, Cambridge

Cambridge Residency Programme: Next-Generation AI Datacentre Networking

Salary not available. View on company website.

Microsoft, Newtown, Cambridge

  • Full time
  • Temporary
  • Onsite working

Posted 1 day ago, 8 Jun | Get your application in today.

Closing date: Closing date not specified

Job ref: fedc52b8b9e14055894a4ccc6e3cb6a7

Location ref: Newtown, Cambridge

Full Job Description

Track A - Modelling & Simulation Best suited to candidates whose primary strength is analytical reasoning, performance modelling, or simulation.

  • Design and analyse novel network architectures (e.g., hybrid optical-electrical, reconfigurable topologies) tailored for AI communication patterns.
  • Develop analytical models and simulators to quantify the performance, cost, and energy trade-offs of proposed designs.
  • Study architectural trade-offs involving topology, transport, collective communication, and emerging optical/networking hardware.
  • Collaborate with systems researchers to compare model predictions with testbed measurements.
  • Evolve existing evaluation tools and frameworks to address new research questions and scenarios relevant to product teams.
  • Track B - Systems Implementation & Experimental Validation Best suited to candidates whose primary strength is building and evaluating real systems on experimental platforms.
  • Implement and evaluate network protocols, transport mechanisms, and collective communication schemes on experimental hardware testbeds featuring modern GPUs, optical circuit switches, and RDMA interconnects.
  • Build and run communication-intensive workloads (e.g., collective algorithm benchmarks, distributed training/inference jobs) to stress-test new network designs.
  • Co-design and validate new protocols and algorithms with modelling collaborators.
  • Drive experimental validation on the group's testbed and contribute to its continued evolution.
  • Expand existing tools and prototypes to address scenarios relevant to both research and product teams.
  • In both tracks, you will publish findings at top-tier academic venues and contribute to Microsoft's long-term AI infrastructure strategy.

  • PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, Operations Research, or a related field.
  • Evidence of independent research, such as first-author publications, strong thesis work, or impactful prototypes.
  • Ability to communicate research clearly through papers, talks, and cross-functional collaboration.
  • Strength in at least one of the following areas:
  • Modelling & simulation (Track A): Demonstrated experience in analytical modelling, simulation, or performance evaluation of networks or distributed systems (e.g., queueing models, flow-level simulation, stochastic models, LP-based analysis, or alpha-beta models).
  • Systems implementation (Track B): Strong systems programming skills in C++/CUDA/Python, with hands-on experience building or evaluating networked systems, distributed systems, or AI training/inference infrastructure., Experience with datacentre network architectures, transport protocols, or collective communication.
  • Familiarity with circuit-switched or optical networking concepts (e.g., optical circuit switches, co-packaged optics).
  • Understanding of AI/ML workload communication patterns (e.g., all-reduce, MoE routing, pipeline parallelism).
  • Experience building simulators, evaluation frameworks, or experimental prototypes.
  • Proficiency in Python and familiarity with scientific computing libraries (NumPy, SciPy, pandas).
  • Experience in one or more of the following systems areas:
  • High-performance networking: RDMA (RoCEv2, InfiniBand), transport protocol implementation, or congestion control.
  • GPU and distributed ML communication: CUDA programming, NCCL, or experience with ML training/inference systems (e.g., PyTorch, Megatron, vLLM).
  • Experimental infrastructure: Building or managing hardware testbeds, measurement and profiling.
  • This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

    Microsoft Research Cambridge is hiring two researchers for its Cambridge Residency Programme (two-year postdoctoral positions) to advance the design and evaluation of next-generation datacentre networks for AI workloads. We are seeking to hire a collaborative pair of researchers with complementary profiles: one focused on analytical modelling and simulation, the other on systems implementation and experimental validation.

Direct job link

https://www.jobs24.co.uk/job/cambridge-residency-programme-next-generation-ai-datacentre-126951610