CROSS-DOMAIN SENTIMENT ANALYSIS USING TRANSFORMER MODELS: A COMPARISON STUDY OF BERT AND ROBERTA

Authors

  • Kezia Sheen, Munna, Hemalatha N

DOI:

https://doi.org/10.25215/8194288770.39

Abstract

Sentiment analysis across different domains remains a significant challenge in natural language processing due to domain-specific vocabularies, linguistic variations, and context-dependent sentiment expressions. This study investigates cross-domain sentiment analysis using two prominent transformer-based models: BERT (Bidirectional Encoder Representations from Transformers) and RoBERTa (Robustly Optimized BERT Pretraining Approach), implemented using PyTorch. The research evaluates sentiment classification across five diverse domains-electronics, books, movies, restaurants, and Twitter-using three distinct experimental approaches: within-domain training (establishing baselines), cross-domain transfer learning (evaluating zero-shot knowledge transfer across 40 domain pairs), and few-shot learning (investigating performance with limited target domain examples at 10, 50, 100, and 500 shots). Results demonstrate that RoBERTa consistently outperforms BERT, achieving mean within-domain F1-scores of 0.902 compared to BERT's 0.892, and cross-domain F1-scores of 0.789 versus 0.761. Cross-domain transfer incurs average performance drops of 11.3% for RoBERTa and 13.2% for BERT. The electronics and Twitter domains emerge as the strongest knowledge sources, while restaurants demonstrate the highest receptiveness to transfer learning. Few-shot learning experiments reveal that 50-shot learning provides an optimal balance between annotation costs and performance improvement. Vocabulary overlap analysis shows a moderate positive correlation (r = 0.492) with transfer success. These findings establish comprehensive benchmarks for transformer-based cross-domain sentiment classification and provide actionable insights for practitioners deploying multi-domain sentiment analysis systems.

Published

2026-03-11