Date Thesis Awarded

5-2022

Access Type

Honors Thesis -- Access Restricted On-Campus Only

Degree Name

Bachelors of Science (BS)

Department

Mathematics

Advisor

Greg Hunt

Committee Members

Rex Kincaid

Heather Sasinowska

Jerry Watkins III

Abstract

Social media data has recently been looked to as a source of public opinion for elections, public policies, and the economy. In order to use this data effectively, natural language processing (NLP) techniques have been developed. Topic modeling, one branch of NLP, works to uncover latent topics with a large collection of tweets. Many topics modeling methods such as LDA and k-medoids clustering are unsupervised. We propose adding a supervised Random Forest layer before performing topic modeling in order to incorporate external knowledge. We find that implementing this layer helps increase the interpretability of topics as well as uncover unique topics. Sentiment analysis, another branch of NLP, measures the polarity of a tweet in order to gain insight into the author’s opinions. We apply several sentiment analysis methods to our dataset and examine the results; we identify weaknesses in these methods and propose steps for improvement.

Recommended Citation

Smith, Grace, "Investigating Text Mining Techniques Within the Context of Politicized Social Media Data" (2022). Undergraduate Honors Theses. William & Mary. Paper 1822.
https://scholarworks.wm.edu/honorstheses/1822

Download

Share Feedback

On-Campus Access Only

COinS

Undergraduate Honors Theses

Investigating Text Mining Techniques Within the Context of Politicized Social Media Data

Date Thesis Awarded

Access Type

Degree Name

Department

Advisor

Committee Members

Abstract

Recommended Citation

Browse

Search

Author Corner

Links

About Scholarworks

Links