Date Awarded


Document Type


Degree Name

Doctor of Philosophy (Ph.D.)


Computer Science


Qun Li

Committee Member

Weizhen Mao

Committee Member

Bin Ren

Committee Member

Evgenia Smirni

Committee Member

Gexin Yu


Recent advances in Artificial Intelligence (AI) are characterized by ever-increasing datasets and rapid growth of model complexity. Many modern machine learning models, especially deep neural networks (DNNs), cannot be efficiently carried out by a single machine. Hence, distributed optimization and inference have been widely adopted to tackle large-scale machine learning problems. Meanwhile, quantum computers that process computational tasks exponentially faster than classical machines offer an alternative solution for resource-intensive deep learning. However, there are two obstacles that hinder us from building large-scale DNNs on the distributed systems and quantum computers. First, when distributed systems scale to many nodes, the training process is slowed down by high communication costs, including frequent training data transmission and gradient exchange. Second, such applications are prevented from being widely used in academia and industry by high computation costs. These costs include training and inference for DNNs deployed on resource-constrained devices. They also include optimization for quantum neural networks (QNNs). To circumvent these obstacles, this dissertation focuses on streamlining the training and inference of classical DNNsand QNNs. To reduce the communication cost of distributed training, we explore the theoretical foundations of two mainstream distributed schemas: classicaldistributed learning and federated learning (FL). Based on these explorations, we propose two novel optimization algorithms that effectively reduce the communication cost without model performance losses. For classical distributed learning, we propose communication-efficient stochastic gradient descent (CE-SGD) to downsize the stochastic gradient used for synchronization. For federated learning, we propose a preconditioned federated optimization algorithm (PreFed) that utilizes the objective function’s geometric information to accelerate the federated training process. To reduce the computation cost for DNN’s inference on portable devices, we use the knowledge distillation technique to propose an efficient and robust Cloud-based deep learning. It enables the Cloud server to generate high-quality and lightweight models, allowing small devices to execute the learning tasks locally. In addition, we propose a computation-and-communication-efficient federated neural architecture search (E-FedNAS) algorithm to automatically find a model structure fitting for the unseen local data. E-FedNAS progressively fine-tunes the model structure and creates the ideal model in one path, which is suitable for small devices. Finally, we investigate the recently-emergent barren plateaus (BP) issue in the variational quantum algorithm (VQA) for the Noisy Intermediate-Scale Quantum (NISQ) computer. The BP issue is that the gradient of the objective function vanishes exponentially with the size of the system. This prevents the QNN from updating and causes inefficiency in PQC training. To this end, we propose a look-around, warm-start gradient-based optimizer (LAWS) to mitigate the BP issue, accelerate QNNs’ training, and improve QNNs’ generalization ability.



© The Author