Options
Hierarchical Adversarial Correction to Mitigate Identity Term Bias in Toxicity Detection
Schäfer, Johannes; Heid, Ulrich; Klinger, Roman (2024): Hierarchical Adversarial Correction to Mitigate Identity Term Bias in Toxicity Detection, in: Orphee De Clercq, Valentin Barriere, Jeremy Barnes, u. a. (Hrsg.), Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, Bangkok, Thailand: Association for Computational Linguistics, S. 35–51, doi: 10.18653/v1/2024.wassa-1.4.
Faculty/Chair:
Author:
Title of the compilation:
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
Conference:
14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis ; Bangkok, Thailand
Publisher Information:
Year of publication:
2024
Pages:
Language:
English
Abstract:
Corpora that are the fundament for toxicity detection contain such expressions typically directed against a target individual or group, e.g., people of a specific gender or ethnicity. Prior work has shown that the target identity mention can constitute a confounding variable. As an example, a model might learn that Christians are always mentioned in the context of hate speech. This misguided focus can lead to a limited generalization to newly emerging targets that are not found in the training data. In this paper, we hypothesize and subsequently show that this issue can be mitigated by considering targets on different levels of specificity. We distinguish levels of (1) the existence of a target, (2) a class (e.g., that the target is a religious group), or (3) a specific target group (e.g., Christians or Muslims). We define a target label hierarchy based on these three levels and then exploit this hierarchy in an adversarial correction for the lowest level (i.e. (3)) while maintaining some basic target features. This approach does not lower the toxicity detection performance but increases the generalization to targets not being available at training time.
Keywords:
Hierarchical Adversarial Correction
Peer Reviewed:
Yes:
International Distribution:
Yes:
Open Access Journal:
Yes:
Type:
Conferenceobject
Activation date:
September 20, 2024
Versioning
Question on publication
Permalink
https://fis.uni-bamberg.de/handle/uniba/98197