COMPARATIVE MACHINE LEARNING APPROACHES FOR DRUG RESPONSE PREDICTION IN CANCER CELL LINES USING GENE EXPRESSION DATA

Authors

  • Dhanya Shetty, Ashritha K, Hemalatha N

DOI:

https://doi.org/10.25215/8194288797.14

Abstract

Accurate prediction of cancer drug response is essential for advancing precision oncology. In this study, we compared three machine learning models including Random Forest, Support Vector Machine (SVM), and XGBoost for classifying drug sensitivity and resistance in 389 cancer cell lines using gene expression profiles from the GDSC database. Data preprocessing included log transformation and selection of the top 500 most variable genes. Binary drug response labels were defined by using the median IC₅₀ as a threshold. Models were trained and evaluated using an 80/20 dataset split and five-fold cross-validation, with accuracy, F1-score, and ROC-AUC as performance metrics. Among the tested approaches, SVM achieved the highest overall accuracy at approximately 64%, demonstrating moderate predictive utility of transcriptomic features for drug response prediction. To facilitate broader application, an interactive Streamlit web tool was developed to enable user-friendly exploration and prediction. These findings underscore the usefulness of machine learning for predicting drug sensitivity and suggest that incorporating additional omics data could further improve predictive performance.

Published

2026-03-13