Doctor of Philosophy (Ph.D.)
Sparse high dimensional time series are common in industry, such as in supply chain demand and retail sales. Accurate and reliable forecasting of high dimensional time series is essential for supply chain planning and business management. In practical applications, sparse high dimensional time series prediction faces three challenges: (1) simple models cannot capture complex patterns, (2) insufficient data prevents us from pursuing more advanced models, and (3) time series in the same dataset may have widely different properties. These challenges prevent the currently prevalent models and theoretically successful advanced models (e.g., neural networks) from working in actual use. We focus our research on a pharmaceutical (pharma) demand forecasting problem. To overcome the challenges faced by sparse high dimensional time series, we develop a cross-series learning framework that trains a machine learning model on multiple related time series and uses cross-series information to improve forecasting accuracy. Cross-series learning is further optimized by dividing the global time series into subgroups based on three grouping schemes to balance the tradeoff between sample size and sample quality. Moreover, downstream inventory is introduced as an additional feature to support demand forecasting. Combining the cross-series learning framework with advanced machine learning models, we significantly improve the accuracy of pharma demand predictions. To verify the generalizability of cross-series learning, a generic forecasting framework containing the operations required for cross-series learning is developed and applied to retail sales forecasting. We further confirm the benefits of cross-series learning for advanced models, especially RNN. In addition to the grouping schemes based on product characteristics, we also explore two grouping schemes based on time series clustering, which do not require domain knowledge and can be applied to other fields. Using a retail sales dataset, our cross-series machine learning models are still superior to the baseline models. This dissertation develops a collection of cross-series learning techniques optimized for sparse high dimensional time series that can be applied to pharma manufacturers, retailers, and possibly other industries. Extensive experiments are carried out on real datasets to provide empirical value and insights for relevant theoretical studies. In practice, our work guides the actual use of cross-series learning.
© The Author
Zhu, Xiaodan, "On Cross-Series Machine Learning Models" (2020). Dissertations, Theses, and Masters Projects. William & Mary. Paper 1616444550.