ENHANCING KANNADA NATURAL LANGUAGE PROCESSING: PARAPHRASE GENERATION AND EVALUATION USING SYNONYM SUBSTITUTION AND CHARACTER-LEVEL SIMILARITY METRICS

Authors

  • K. Annapoorneshwari Shetty, Charishma, Shipali

DOI:

https://doi.org/10.25215/8194288797.04

Abstract

Kannada, being a rich and widely spoken language, still lacks extensive Natural Language Processing (NLP) resources compared to English and other major languages. Paraphrase generation, which involves rewriting a sentence while preserving its meaning, is especially challenging in Kannada due to limited datasets and lexical complexity. This research explores two approaches for Kannada paraphrase generation and evaluation. The first approach uses the IndicCorp Kannada dataset to train a Word2Vec model, which generates paraphrases by identifying semantically similar words and evaluates them using cosine similarity scores. Although this method provides high similarity values, many of the generated paraphrases were found to be contextually incorrect or unnatural to native Kannada speakers. To address this limitation, a second approach based on manual synonym substitution was implemented, where linguistically appropriate Kannada synonyms were provided to maintain meaning and readability. Character-level similarity metrics were used to evaluate both methods. Results show that the synonym-based approach produces more meaningful and human-like paraphrases, highlighting the need for context-aware techniques in Kannada NLP.

Published

2026-03-13