Social Networks: The Tracking of COVID-19 Misinformation Using NLP and Graph-based Approaches

Introduction: Misinformation during the COVID-19 pandemic resulted in many unnecessary infections, hospitalizations, and fatalities [10]. This is a trend that has been seen in past health epidemics [11, 12]. Our work aims to study the spread of COVID-19 misinformation on social media platforms such as Twitter and Reddit through a Natural Language Processing (NLP) and Graph-based approach. We use NetworkX, Gephi, Streamlit, Plotly, and Large Language Models to build a comprehensive analytics and visualization system that that is capable of extracting core metrics identifying malicious users in the social network. Lastly, our results are consistent with previous work [1], however, we extend on what has been done by contributing a robust QA Dashboard system that is interactive and intuitive for users to learn about vaccine misinformation. This work can be valuable for many organizations that aim to address misinformation as it provides an actionable framework to implement.

Technical Details

Data Analytics: NetworkX, Gephi, NTLK, re, and PowerLaw were use to build our graphs and analyze its properties.
Data Augmentation:Transformers were used to augment the initial dataset with columns relating to sentiment score (negative, neutral, positive) and information validity. We also used PRAW to collect Reddit posts and comments to augment our initial datasets and provide a more comprehensive analysis.
Data Visualization: Matplotlib, Seaborn, Plotly, Wordcloud, Streamlit were used to visualize the graph and NLP-related properties of our datasets.

Source Code & Paper

All of our code, datasets, gexf files, and more can be found here.
To obtain the written paper, please reach out to me!