Date Awarded

2023

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Computer Science

Advisor

Bin Ren

Committee Member

Gang Zhou

Committee Member

Evgenia Smirni

Committee Member

Pieter Peers

Committee Member

Gagan Agrawal

Abstract

Deep learning, particularly deep neural networks (DNNs), has led to significant advancements in various fields, such as autonomous driving, natural language processing, extended reality (XR), and view synthesis. Mobile and edge devices, with their efficient and specialized processors and suitability for real-time scenarios, have become the primary carriers for these emerging applications. The advancements in AutoML tools (e.g., Network Architecture Search) and training techniques have resulted in increasingly complex and deep DNN architectures with larger computational requirements. However, achieving real-time DNN execution (inference) on mobile devices is a challenging task due to the limited computing and storage resources available on embedded chips. Moreover, there is a considerable performance gap between the theoretical peak and the actual performance of DNN workloads on mobile devices due to the lack of understanding between the hardware and parallel algorithms. This dissertation aims to enable the real-time execution of DNNs on mobile devices by proposing a range of compiler-based optimizations. Traditional convolutional neural networks (CNNs) consist of computation-intensive convolution layers, which are responsible for a significant portion of the entire computational workload. To this end, we present PatDNN, an innovative compression-compilation codesign framework that facilitates the compression of large-scale, computation-intensive DNNs to fit within the constrained storage and computation resources available on mobile devices. The PatDNN framework incorporates a hardware-friendly, pattern-based pruning method to compress DNN model parameters, and a range of sophisticated compiler optimizations tailored for pattern-based pruning to further enhance the efficiency of the system. As higher accuracy requirements for different machine learning tasks, researchers have designed deeper and deeper models that involve moving substantial amounts of data among the layers. We present DNNFusion, an advanced operator fusion framework that can fuse multiple successive operators within a DNN into a single operator, significantly decreasing the number of memory accesses. Moreover, we propose a novel mathematical-property based graph rewriting framework to simplify the computation even further. To take advantage of the emerging dedicated accelerators in mobile SOCs, we propose GCD$^2$ specifically for mobile Digital Signal Processors (DSPs). In contrast to mainstream processors, DSPs feature a wider SIMD width and a wider variety of vector instructions. GCD$^2$ incorporates our novel compiler optimizations to capitalize on the unique capabilities of mobile DSPs and enhance hardware utilization. Lastly, we propose SOD$^2$ for optimizing dynamic DNNs, where the tensor shapes and even the set of operators used are dependent upon the input and/or execution. We derive a classification of common operators that form DNNs, and propose a Rank and Dimension Propagation (RDP) algorithm based on the classification. SOD$^2$ statically determines the shapes of operators, and then enables a series of other compiler optimizations.

DOI

https://dx.doi.org/10.21220/s2-mfpe-jh96

Rights

Recommended Citation

Niu, Wei, "Achieving Real-Time Dnn Execution On Mobile Devices With Compiler Optimizations" (2023). Dissertations, Theses, and Masters Projects. William & Mary. Paper 1697552565.
https://dx.doi.org/10.21220/s2-mfpe-jh96

Dissertations, Theses, and Masters Projects

Achieving Real-Time Dnn Execution On Mobile Devices With Compiler Optimizations

Date Awarded

Document Type

Degree Name

Department

Advisor

Committee Member

Committee Member

Committee Member

Committee Member

Abstract

DOI

Rights

Recommended Citation

Included in

Browse

Search

Author Corner

Links

About Scholarworks

Links

Dissertations, Theses, and Masters Projects

Achieving Real-Time Dnn Execution On Mobile Devices With Compiler Optimizations

Author

Date Awarded

Document Type

Degree Name

Department

Advisor

Committee Member

Committee Member

Committee Member

Committee Member

Abstract

DOI

Rights

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Links

About Scholarworks

Links