Pathways: Asynchronous Distributed Dataflow for ML
We present the design of a new large scale orchestration layer for accelerators that is specifically designed to enable exploration of new systems and research ideas, while retaining state of the art performance for current models.
Our system, pathways, uses a sharded dataflow graph of asynchronous operators that consume and produce futures, and efficiently gang-schedules heterogeneous parallel computations on thousands of accelerators while coordinating data transfers over their dedicated interconnects.
We demonstrate that pathways can achieve performance parity (100% accelerator utilization) with state-of-the-art systems when running state-of - the-art spmd computations over 2048 tpus, while also delivering throughput comparable to the spmd case for transformer models that are pipelined across 16 stages, or sharded across two islands of accelerators connected over a data center network.
Authors
Paul Barham, Aakanksha Chowdhery, Jeff Dean, Sanjay Ghemawat, Steven Hand, Dan Hurt, Michael Isard, Hyeontaek Lim, Ruoming Pang, Sudip Roy, Brennan Saeta, Parker Schuh, Ryan Sepassi, Laurent El Shafey, Chandramohan A. Thekkath, Yonghui Wu