Date Thesis Awarded

5-2023

Access Type

Honors Thesis -- Open Access

Degree Name

Bachelors of Science (BS)

Department

Data Science

Advisor

Dan Runfola

Committee Members

Carrie Dolan

Anthony Stefanidis

Abstract

Administrative boundaries - i.e., states, counties, or districts - are fiat boundaries; they exist purely as defined by human interpretation. Because of this, and despite their critical importance to government functions, the accuracy of data products claiming to represent such boundaries is difficult to measure. Here, I explore this topic using three boundary data sets: the open source geoBoundaries data set, the humanitarian UN OCHA’s Common Operational Datasets (COD), and Esri’s commercial administrative divisions 0 and 1 data sets in the Living Atlas. The accuracy of each was quantified as the percent overlap between each data set and an authoritative source, boundaries from the UN’s Second Administrative Level Boundaries programme. The authoritative source is considered the most accurate boundary data set available, but is limited in both temporal and spatial coverage, and has a restrictive license type. Overlap was calculated for every division on a feature-by-feature basis. These values were then averaged to country, level, and dataset wide values. The analysis revealed that at the country scale, Esri had a mean match of 89%, while geoBoundaries had a match of 82%, and UN OCHA COD had a match of 77%. Similarity between the three datasets and UN SALB was also measured at the administrative level, where Esri had the highest similarity at the ADM0 level (99.6%) and the ADM1 level (86.9%), and geoBoundaries has the highest similarity at the ADM2 level (87%). This research introduces a baseline into the literature, and helps to establish potential future directions for overcoming other challenges related to this topic.

Included in

Data Science Commons

Share

COinS