"Low-Rank Matrix And Tensor Models For Data Science Applications" by Jeremy Moulton Myers

ORCID ID

https://orcid.org/0000-0002-5387-7565

Date Awarded

2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Computer Science

Advisor

Andreas Stathopoulos

Committee Member

Robert M Lewis

Committee Member

Pieter Peers

Committee Member

Bin Ren

Committee Member

Richard Lehoucq

Abstract

Low-rank approximations play an important role in data science analysis and applications. When the model is linear and the data is represented as a matrix, the optimal rank-r approximation is given by the r dominant factors of the singular value decomposition (SVD). When the model is multilinear and the data is represented as a multi-way array called a tensor, the optimal rank-r approximation may not even exist. Nonetheless, the canonical polyadic decomposition (CP) provides a useful low-rank tensor approximation for interpretability and analysis in a way similar to the matrix SVD but across multiple axes simultaneously. In this dissertation, we focus on two important problems involving low-rank matrix and tensor models for data science applications. The first thrust of this dissertation involves maintaining a summary or sketch matrix from a data stream. We study the trade-offs between an established framework for low-rank approximation of streaming data, called incremental SVD, where a low-rank approximation is maintained for every innovation of data, and randomized sketching, where the data stream is summarized and a low-rank approximation is reconstructed later. We analyze the role of iterative methods with incremental SVD and provide numerical evidence to support our analysis. We evaluate the relative accuracy and computational costs of incremental SVD and randomized sketching on datasets that are challenging for even state-of-the-art batch solvers. We develop a sampling-based convergence criterion to terminate computation early with negligible loss of fidelity. The second thrust involves computing a low-rank CP decomposition model when the input entries are counts, i.e., non-negative integers, that are assumed to follow a Poisson distribution. Our first contribution is a systematic study of state-of-the-art algorithms. Our second contribution is a hybrid method that switches between a deterministic and a randomized algorithm to leverage desirable computational properties from both methods with the overall goal of achieving optimal accuracy-performance trade-off. Our third contribution is a heuristic that detects convergence toward undesirable solutions and implements a restarting technique to improve the likelihood the method will converge to the optimal solution.

DOI

https://dx.doi.org/10.21220/s2-32sv-zk37

Rights

© The Author

Share

COinS