Doctor of Philosophy (Ph.D.)
Compiler optimization is a long-standing research field that enhances program performance with a set of rigorous code analyses and transformations. Traditional compiler optimization focuses on general programs or program structures without considering too much high-level application operations or data structure knowledge. In this thesis, we claim that an integrated view of the application and compiler is helpful to further improve program performance. Particularly, we study integrated optimization opportunities for three kinds of applications: irregular tree-based query processing systems such as B+ tree, security enhancement such as buffer overflow protection, and tensor/matrix-based linear algebra computation. The performance of B+ tree query processing is important for many applications, such as file systems and databases. Latch-free B+ tree query processing is efficient since the queries are processed in batches without locks. To avoid long latency, the batch size can not be very large. However, modern processors provide opportunities to process larger batches parallel with acceptable latency. From studying real-world data, we find that there are many redundant and unnecessary queries especially when the real-world data is highly skewed. We develop a query sequence transformation framework Qtrans to reduce the redundancies in queries by applying classic dataflow analysis to queries. To further confirm the effectiveness, we integrate Qtrans into an existing BSP-based B+ tree query processing system, PALM tree. The evaluations show that the throughput can be improved up to 16X. Heap overflows are still the most common vulnerabilities in C/C++ programs. Common approaches incur high overhead since it checks every memory access. By analyzing dozens of bugs, we find that all heap overflows are related to arrays. We only need to check array-related memory accesses. We propose Prober to efficiently detect and prevent heap overflows. It contains Prober-Static to identify the array-related allocations and Prober-Dynamic to protect objects at runtime. In this thesis, our contributions lie on the Prober-Static side. The key challenge is to correctly identify the array-related allocations. We propose a hybrid method. Some objects can be identified as array-related (or not) by static analysis. For the remaining ones, we instrument the basic allocation type size statically and then determine the real allocation size at runtime. The evaluations show Prober-Static is effective. Tensor algebra is widely used in many applications, such as machine learning and data analytics. Tensors representing real-world data are usually large and sparse. There are many sparse tensor storage formats, and the kernels are different with varied formats. These different kernels make performance optimization for sparse tensor algebra challenging. We propose a tensor algebra domain-specific language and a compiler to automatically generate kernels for sparse tensor algebra computations, called SPACe. This compiler supports a wide range of sparse tensor formats. To further improve the performance, we integrate the data reordering into SPACe to improve data locality. The evaluations show that the code generated by SPACe outperforms state-of-the-art sparse tensor algebra compilers.
© The Author
Tian, Ruiqin, "Performance Optimization With An Integrated View Of Compiler And Application Knowledge" (2021). Dissertations, Theses, and Masters Projects. William & Mary. Paper 1627047810.