Date Thesis Awarded
5-2021
Access Type
Honors Thesis -- Open Access
Degree Name
Bachelors of Arts (BA)
Department
Interdisciplinary Studies
Advisor
Daniel Runfola
Committee Members
Anthony Stefanidis
Maurits van der Veen
Abstract
Building on insights from two years of manually extracting events information from online news media, an interactive information extraction environment (IIEE) was developed. SCOPE, the Scientific Collection of Open-source Policy Evidence, is a Python Django-based tool divided across specialized modules for extracting structured events data from unstructured text. These modules are grouped into a flexible framework which enables the user to tailor the tool to meet their needs. Following principles of user-oriented learning for information extraction (IE), SCOPE offers an alternative approach to developing AI-assisted IE systems. In this piece, we detail the ongoing development of the SCOPE tool, present methods and results of tests of the efficacy of SCOPE relative to past methods, and provide a novel framework for future tests of AI-assisted IE tasks. Information gathered from a four-week period of use was analyzed to evaluate the initial utility of the tool and establish baseline accuracy metrics. Using the SCOPE tool, 15 users extracted 529 summaries and 362 structured events from 207 news articles achieving an accuracy of 31.8% holding time constant at 4 minutes per source. To demonstrate how fully or partially-automated AI processes can be integrated into SCOPE, a baseline AI was implemented and achieved 4.8% accuracy at 3.25 seconds per source. These results illustrate the ability of SCOPE to present the relative strengths and weaknesses of manual users and AI, as well as establish precedent and methods for integrating the two.
Recommended Citation
Crittenden, Matthew, "SCOPE: Building and Testing an Integrated Manual-Automated Event Extraction Tool for Online Text-Based Media Sources" (2021). Undergraduate Honors Theses. William & Mary. Paper 1651.
https://scholarworks.wm.edu/honorstheses/1651
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.