Fifteenth International Workshop on
Parallel Programming Models and Systems Software for
High-End Computing (P2S2), 2022

To be held in conjunction with
ICPP 2022: The 51st International Conference on Parallel Processing
August 29th to Sept 1st, 2022 in Bordeaux, France

Abstract - Inference Accelerator Deployment at Meta

In this talk, we provide a deep dive into the deployment of inference accelerators at Meta. Our workloads have unique requirements such as large model sizes, compute as well as memory bandwidth requirements, and sufficient network bandwidth. As such, we co-designed a platform based on the unique needs of our workloads that we standardized as an Open Compute Platform with a view to optimize performance per watt on our workloads. We have optimized and leveraged this platform and accelerator system to serve production traffic.

Biography - Cao Gao

Cao Gao is a Software Engineer at Meta, mainly working on its machine learning accelerator deployment and performance optimization with data center AI workloads. Prior to that, he was a Software Engineer at Google, mainly working on its Edge TPU ML accelerator series which were deployed in products such as Google Pixel Tensor SoC. He received an MS and PhD in Computer Science and Engineering from the University of Michigan.