Date Awarded

2023

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Computer Science

Advisor

Bin Ren

Committee Member

Gang Zhou

Committee Member

Evgenia Smirni

Committee Member

Pieter Peers

Committee Member

Gagan Agrawal

Abstract

Deep learning, particularly deep neural networks (DNNs), has led to significant advancements in various fields, such as autonomous driving, natural language processing, extended reality (XR), and view synthesis. Mobile and edge devices, with their efficient and specialized processors and suitability for real-time scenarios, have become the primary carriers for these emerging applications. The advancements in AutoML tools (e.g., Network Architecture Search) and training techniques have resulted in increasingly complex and deep DNN architectures with larger computational requirements. However, achieving real-time DNN execution (inference) on mobile devices is a challenging task due to the limited computing and storage resources available on embedded chips. Moreover, there is a considerable performance gap between the theoretical peak and the actual performance of DNN workloads on mobile devices due to the lack of understanding between the hardware and parallel algorithms. This dissertation aims to enable the real-time execution of DNNs on mobile devices by proposing a range of compiler-based optimizations. Traditional convolutional neural networks (CNNs) consist of computation-intensive convolution layers, which are responsible for a significant portion of the entire computational workload. To this end, we present PatDNN, an innovative compression-compilation codesign framework that facilitates the compression of large-scale, computation-intensive DNNs to fit within the constrained storage and computation resources available on mobile devices. The PatDNN framework incorporates a hardware-friendly, pattern-based pruning method to compress DNN model parameters, and a range of sophisticated compiler optimizations tailored for pattern-based pruning to further enhance the efficiency of the system. As higher accuracy requirements for different machine learning tasks, researchers have designed deeper and deeper models that involve moving substantial amounts of data among the layers. We present DNNFusion, an advanced operator fusion framework that can fuse multiple successive operators within a DNN into a single operator, significantly decreasing the number of memory accesses. Moreover, we propose a novel mathematical-property based graph rewriting framework to simplify the computation even further. To take advantage of the emerging dedicated accelerators in mobile SOCs, we propose GCD$^2$ specifically for mobile Digital Signal Processors (DSPs). In contrast to mainstream processors, DSPs feature a wider SIMD width and a wider variety of vector instructions. GCD$^2$ incorporates our novel compiler optimizations to capitalize on the unique capabilities of mobile DSPs and enhance hardware utilization. Lastly, we propose SOD$^2$ for optimizing dynamic DNNs, where the tensor shapes and even the set of operators used are dependent upon the input and/or execution. We derive a classification of common operators that form DNNs, and propose a Rank and Dimension Propagation (RDP) algorithm based on the classification. SOD$^2$ statically determines the shapes of operators, and then enables a series of other compiler optimizations.

DOI

https://dx.doi.org/10.21220/s2-mfpe-jh96

Rights

© The Author

Available for download on Sunday, August 25, 2024

Share

COinS