Recent work in machine learning demonstrates the potential of ultra low-bitwidth deep neural networks (DNNs) -- i.e., DNNs with binarized weights and/or activations. The dominant computations of such binary neural networks (BNNs) are bitwise logic operations, making them well suited for efficient deep learning both in datacenters and at the edge. The major drawback of binarization is a deficit in model accuracy. This talk presents our ongoing investigation into BNNs, using a co-design approach featuring contributions to both algorithms and hardware accelerators. We will first introduce precision gating (PG), a dynamic, fine-grained, and trainable quantization scheme. Unlike static approaches, PG exploits input-dependent dynamic sparsity at run time, resulting in a significant reduction in compute cost with a minimal impact on accuracy. Next, we present FracBNN, which leverages PG to substantially improve the accuracy of BNNs. Our experiments show that for the first time, a BNN model can achieve MobileNetV2-level accuracy on the ImageNet dataset. FracBNN demonstrates the ability of real-time image classification on an embedded FPGA; it surpasses the best-known BNN design on FPGAs with an increase of 28.9% in top-1 accuracy and a 2.5x reduction in model size. Finally, we briefly discuss PokeBNN, a new family of BNN models that establishes the state-of-the-art accuracy-efficiency Pareto frontier.
Biography:
Zhiru Zhang is an Associate Professor in the School of Electrical and Computer Engineering at Cornell University and a member of the Computer Systems Laboratory. He received his B.S. in Computer Science from Peking University, an M.S. in Computer Science from UCLA, and his Ph.D. in Computer Science also from UCLA. Upon graduation, he cofounded AutoESL Design Technologies, Inc. based on his dissertation research at UCLA on high-level synthesis. Zhang's current research investigates new algorithms, methodologies, and design automation tools for building heterogeneous systems. Recent publications have focused on high-level synthesis, hardware specialization for machine learning, and programming models for software-defined FPGAs. His work has been recognized by many rewards, including a Facebook Research Award, a Google Faculty Research Award, and the DAC Under-40 Innovators Award. His paper on SDC-based scheduling is inducted into the ACM/SIGDA TCFPGA Hall of Fame for the class of 2022.