Instance Selection Improves Cross-Lingual Model Training for Fine-Grained Sentiment Analysis

Klinger, RomanRomanKlinger0000-0002-2014-6619Cimiano, PhilippPhilippCimiano2024-03-122024-03-122015978-1-941643-77-8https://fis.uni-bamberg.de/handle/uniba/94001Scarcity of annotated corpora for many languages is a bottleneck for training finegrained sentiment analysis models that can tag aspects and subjective phrases. We propose to exploit statistical machine translation to alleviate the need for training data by projecting annotated data in a source language to a target language such that a supervised fine-grained sentiment analysis system can be trained. To avoid a negative influence of poor-quality translations, we propose a filtering approach based on machine translation quality estimation measures to select only high-quality sentence pairs for projection. We evaluate on the language pair German/English on a corpus of product reviews annotated for both languages and compare to in-target-language training. Projection without any filtering leads to 23 % F1 in the task of detecting aspect phrases, compared to 41 % F1 for in-target-language training. Our approach obtains up to 47 % F1. Further, we show that the detection of subjective phrases is competitive to in-target-language training without filtering.engInstance Selection004Instance Selection Improves Cross-Lingual Model Training for Fine-Grained Sentiment Analysisconferenceobject10.18653/v1/K15-1016http://www.aclanthology.org/K15-1016