Date Thesis Awarded


Access Type

Honors Thesis -- Access Restricted On-Campus Only

Degree Name

Bachelors of Science (BS)


Data Science


Dana Willner

Committee Members

Alexander Nwala

Margaret Saha

Kristy Murray


Distinct health disparities are seen across Belize as rural populations suffer from insufficient access to health care services and, as a result, experience higher rates of illness. This study specifically addresses this population’s lack of available diagnostic testing for respiratory diseases and COVID-19. Differentiating COVID-19 from other illnesses can be vital in the treatment process, allowing medical professionals to utilize the most effective treatments for specific pathogens. In areas with a lack of testing infrastructure or long result waiting times, doctors are forced to rely on broad therapeutic measures that do not render the same level of success in recovery. This study attempts to provide a predictive alternative that could guide treatment decisions. The data for this project is a subset of survey data collected from an Acute Febrile Illness Surveillance (AFI) study that began working with Belize health clinics in 2020. Using a combination of patient’s demographic factors, lifestyle decisions, and symptom reports, a collection of classification models were applied in order to predict their eventual diagnosis. The data exhibited a large class imbalance, with COVID-19 infections as the minority class, therefore models were developed and evaluated with and without the use of SMOTE over-sampling. Models were evaluated using overall accuracy as well as precision and recall, with a particular focus on recall for the minority class. Analysis of feature importance for logistic regression and Random Forest models provided insight into which of the available determinants have the greatest impact on a patient’s diagnosis. Ultimately, this study determined that oversampling provides greater recall for the rare class at the cost of overall accuracy, and that age and a selection of respiratory symptoms are the most influential features for predicting respiratory illness.

Available for download on Thursday, May 08, 2025

On-Campus Access Only