A COMPARATIVE ANALYSIS OF TRADITIONAL AND SEMANTIC MACHINE LEARNING MODELS FOR MENTAL HEALTH STATUS CLASSIFICATION

Authors

  • Sushanth S Shetty, Nausheeda BS

DOI:

https://doi.org/10.25215/8194288797.54

Abstract

Mental illness is a dominant problem of modern society. Extensive use of social and digital media produces language data that may express early-stage mental illness disorders. In this paper, we present the architecture, implementation, and assessment of a machine learning model to predict mental health status from text. We perform a comparative analysis of two different Natural Language Processing (NLP) feature extraction methods. The first one is the classical vector space model on TF-IDF features and passed to a Logistic Regression classifier. The second one is a modern semantic model based on pre- trained Sentence-BERT embeddings with rich contextual meaning and paired with a Logistic Regression classifier. The two approaches were trained and tested on the “Combined Data” dataset, a labeled corpus with topics like Anxiety, Depression, and Stress. Our empirical evidence shows that the semantic approach works much better, with 94.81% accuracy, which is reflective of the value of context comprehension for this problem. The final model is realized as an interactive Streamlit web application. This book provides an empirical, extensively documented framework for mental health classification and an open discussion of trade-offs between outdated and cutting-edge NLP feature engineering.

Published

2026-03-13