WellnessSquad: An Augmented Approach to Suicide Ideation Classification

WellnessSquad is an updated version of WellnessOracle, a project developed in GS/EECS 5327, Introduction to Machine Learning & Pattern Recognition by Shogo Toyonaga, Abel Habte, Dongwon Lee, and Jonathan Ramos.

The performance has been substantially improved in addition to new features and web scraping support. Please see below for a summary of the novel updates:

New handcrafted features for training/testing
Implemented feature scaling & pre-processing support
Added data analytics & visualization support
Added new models with superior performance (e.g., Fine-tuned BERT)
Added webscraping and inference support for Reddit, Quora, and YouTube
Added support for a chatbot in case users need to seek help for their problems

Technical Details

Machine Learning: sklearn, optuna, transformers, torch, and numpy were use to develop and fine-tune the models for text-based classification tasks.
Data Visualization: dash, emojis, and wordcloud were used to visualize model performance on the training and testing set and to identify patterns between keywords.
Web Scraping: PlayWright, BeautifulSoup4, Requests, and youtube_transcript_api were used to pull content from Reddit, Quora, and YouTube to further evaluate the model's performance on real-time data.
Graphical User Interface (GUI): Streamlit was used to support the end-user with an intuitive user interface that calls inference on our pre-trained models. Data from various websites can be scraped and automatically classified as suicide or non-suicide.

Source Code & Paper

The training and testing data used to generate our models can be accessed here.
Our code can be found here.