Date Awarded


Document Type


Degree Name

Doctor of Philosophy (Ph.D.)


Computer Science


Bin Ren

Committee Member

Bin Ren

Committee Member

Adwait Jog

Committee Member

Andreas Stathopoulos

Committee Member

Pieter Peers


Graph processing is at the heart of many modern applications where graphs are used as the basic data structure to represent the entities of interest and the relationships between them. Improving the performance of graph-based applications, especially using parallelism techniques, has drawn significant interest both in academia and industry. On the one hand, modern CPU architectures are able to provide massive computational power by using sophisticated memory hierarchy and multi-level parallelism, including thread-level parallelism, data-level parallelism, etc. On the other hand, graph processing workloads are notoriously challenging for achieving high performance due to their irregular computation pattern and unpredictable control flow. Therefore, how to accelerate the performance of graph-based applications using parallelism is still an open question. This dissertation focuses on providing high performance for graph-based applications. To take full advantage of multi-level parallelism resources provided by CPUs, this dissertation studies the characteristics of graph-based applications and matches their parallel solutions with the underlying hardware via algorithm and system co-design. This dissertation divides graph-based applications into three categories: typical graph algorithms, sequential graph-based applications, and applications with graph-based solutions. The first category comprises typical graph algorithms with available parallel solutions. This dissertation proposes GraphPhi as a new approach to graph processing on emerging Intel Xeon Phi-like architectures. The second category includes specialized graph applications without nontrivial parallel solutions. This dissertation studies a state-of-the-art 2-hop labeling approach named Pruned Landmark Labeling (PLL). This dissertation proposes Batched Vertex-Centric PLL (BVC-PLL), which breaks PLL's inherent dependencies and parallelizes it in a scalable way. The third category includes applications that rely on graph-based solutions. This dissertation studies the sequential search algorithm for the graph-based indexing methods used for the Approximate Nearest Neighbor Search (ANNS) problem. This dissertation proposes Speed-ANN, a parallel similarity search algorithm that reveals hidden intra-query parallelism to accelerate the search speed while fulfilling the high accuracy requirement. Moreover, this dissertation further explores the optimization opportunities for computational graph-based deep neural network inference running on tiny devices, specifically microcontrollers (MCUs). Altogether, this dissertation studies graph-based applications and improves their performance by providing solutions of multi-level parallelism via algorithm and system co-design to match them with the underlying multi-core CPU architectures.




© The Author