Date Awarded
2024
Document Type
Dissertation
Degree Name
Doctor of Philosophy (Ph.D.)
Department
Computer Science
Advisor
Xu Liu
Committee Member
Bin Ren
Committee Member
Weizhen Mao
Committee Member
Andreas Stathopoulos
Abstract
Software inefficiencies are inevitable in computer systems. At the code level, software packages have become increasingly complex, they are comprised of a large amount of source code, sophisticated control and data flow, and growing levels of abstraction. This complexity often introduces inefficiencies across software stacks, leading to performance degradation. At the resource level, the evolution of hardware outpaces the performance optimization of software, leading to resource wastage and energy dissipation in emerging architecture. To better understand program behaviors, software developers take advantage of performance profiling tools. Existing profiling techniques, whether fine-grained profilers or coarse-grained profilers focus on identifying hotspots, which is the code region that consumes plenty of resources during program execution. Although hotspot analysis is effective, it hardly diagnoses whether a resource is being used in a productive manner of a program. Thus, developers need to make extra effort to decide if a hotspot needs to be optimized. For this reason, to better perform program optimizations, we need tools that investigate resource wastage rather than resource usage. In this dissertation, we perform program inefficiency detection from different perspectives. First, we study the inefficiency in compiler optimizations. We propose CIDetector, a fine-grained profiler, to detect compiler-introduced and compiler-missed inefficiencies. Through our analysis, we select 12 representative programs from different domains to form a dataset CIBench. We perform the first study on compiler-related inefficiencies in fully optimized binary codes, it offers valuable insights for scientific programmers, compiler writers, and tool developers. Moreover, we study the interaction (between Python code and native libraries) inefficiency for Python applications, and extract two inefficiency patterns that are common in interaction inefficiencies. Based on these patterns, we categorize the interaction inefficiencies by their root causes. We propose PieProf, a lightweight profiler, to pinpoint interaction inefficiencies in Python applications. The principle of PieProf is to measure the inefficiencies in the native execution and associate inefficiencies with high-level Python code to provide a holistic view. Guided by PieProf, we optimize 17 real-world applications, yielding speedups up to 6.3 times on application level. In the meantime, we notice the same program inefficiency patterns occur in students' codes. As instructors, we realized that the importance of code performance education to students can never be exaggerated. By exploring the pedagogical method and developing educational tools, we hope to understand and address the challenges that students have during programming. We report our experience of integrating VS Code into an introductory-level Python programming course, together with comprehensive guidance, it significantly balances the teaching resources and shortens the students' learning curves. Additionally, we propose ProTracker, an end-to-end solution to estimate the progress of programming assignments with machine learning techniques. ProTracker employs static analysis to extract features from assignment samples from previous semesters, and applied a two-level cross-validation method for tuning and selecting the proper machine-learning model. It runs as a VS Code extension and performs real-time programming progress estimation for students.
DOI
https://dx.doi.org/10.21220/s2-29vh-dt42
Rights
© The Author
Recommended Citation
Tan, Jialiang, "Program Analysis For Software Engineers And Students" (2024). Dissertations, Theses, and Masters Projects. William & Mary. Paper 1709301515.
https://dx.doi.org/10.21220/s2-29vh-dt42