ORCID ID
https://orcid.org/0000-0002-5387-7565
Date Awarded
2024
Document Type
Dissertation
Degree Name
Doctor of Philosophy (Ph.D.)
Department
Computer Science
Advisor
Andreas Stathopoulos
Committee Member
Robert M Lewis
Committee Member
Pieter Peers
Committee Member
Bin Ren
Committee Member
Richard Lehoucq
Abstract
Low-rank approximations play an important role in data science analysis and applications. When the model is linear and the data is represented as a matrix, the optimal rank-r approximation is given by the r dominant factors of the singular value decomposition (SVD). When the model is multilinear and the data is represented as a multi-way array called a tensor, the optimal rank-r approximation may not even exist. Nonetheless, the canonical polyadic decomposition (CP) provides a useful low-rank tensor approximation for interpretability and analysis in a way similar to the matrix SVD but across multiple axes simultaneously. In this dissertation, we focus on two important problems involving low-rank matrix and tensor models for data science applications. The first thrust of this dissertation involves maintaining a summary or sketch matrix from a data stream. We study the trade-offs between an established framework for low-rank approximation of streaming data, called incremental SVD, where a low-rank approximation is maintained for every innovation of data, and randomized sketching, where the data stream is summarized and a low-rank approximation is reconstructed later. We analyze the role of iterative methods with incremental SVD and provide numerical evidence to support our analysis. We evaluate the relative accuracy and computational costs of incremental SVD and randomized sketching on datasets that are challenging for even state-of-the-art batch solvers. We develop a sampling-based convergence criterion to terminate computation early with negligible loss of fidelity. The second thrust involves computing a low-rank CP decomposition model when the input entries are counts, i.e., non-negative integers, that are assumed to follow a Poisson distribution. Our first contribution is a systematic study of state-of-the-art algorithms. Our second contribution is a hybrid method that switches between a deterministic and a randomized algorithm to leverage desirable computational properties from both methods with the overall goal of achieving optimal accuracy-performance trade-off. Our third contribution is a heuristic that detects convergence toward undesirable solutions and implements a restarting technique to improve the likelihood the method will converge to the optimal solution.
DOI
https://dx.doi.org/10.21220/s2-32sv-zk37
Rights
© The Author
Recommended Citation
Myers, Jeremy Moulton, "Low-Rank Matrix And Tensor Models For Data Science Applications" (2024). Dissertations, Theses, and Masters Projects. William & Mary. Paper 1717521652.
https://dx.doi.org/10.21220/s2-32sv-zk37