Fifteenth International Workshop on
Parallel Programming Models and Systems Software for
High-End Computing (P2S2), 2022

To be held in conjunction with
ICPP 2022: The 51st International Conference on Parallel Processing
August 29th to Sept 1st, 2022 in Bordeaux, France

Abstract - Demystify Communication Behavior in Training Deep Learning Recommendation Model

Deep learning recommendation models (DLRM) are ubiquitously adopted by many companies, including Amazon, Netflix, Google, and Meta, to improve user experience in various products. DLRM is also part of the MLPerf training and inference benchmarks. However, the advanced and complex parallelism strategies developed in DLRM and PyTorch frameworks make it challenging to comprehend how the underlying communication performs in distributed training. In this talk, I will present the essential communication behavior in training DLRM with a practical production workload and shed light on the challenges in optimizing the communication performance for DLRM workloads. Moreover, this talk will introduce the open-source benchmarks and tools that enable researchers and engineers to reproduce and optimize the communication of real-world DLRM workloads.

Biography - Ching-Hsiang Chu

Dr. Ching-Hsiang Chu is a research scientist in Meta (formally Facebook). He received his Ph.D. degree in Computer Science and Engineering from The Ohio State University, Columbus, Ohio, USA, in 2020. His research interests include high-performance computing, parallel programing models and distributed AI.