Options
Hierarchical Adversarial Correction to Mitigate Identity Term Bias in Toxicity Detection
Schäfer, Johannes; Heid, Ulrich; Klinger, Roman (2025): Hierarchical Adversarial Correction to Mitigate Identity Term Bias in Toxicity Detection, in: Bamberg: Otto-Friedrich-Universität, S. 35–51.
Faculty/Chair:
Author:
Publisher Information:
Year of publication:
2025
Pages:
Source/Other editions:
Orphee De Clercq, Valentin Barriere, Jeremy Barnes, u. a. (Hrsg.), Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, Bangkok, Thailand: Association for Computational Linguistics, 2024, S. 35–51
Year of first publication:
2024
Language:
English
Abstract:
Corpora that are the fundament for toxicity detection contain such expressions typically directed against a target individual or group, e.g., people of a specific gender or ethnicity. Prior work has shown that the target identity mention can constitute a confounding variable. As an example, a model might learn that Christians are always mentioned in the context of hate speech. This misguided focus can lead to a limited generalization to newly emerging targets that are not found in the training data. In this paper, we hypothesize and subsequently show that this issue can be mitigated by considering targets on different levels of specificity. We distinguish levels of (1) the existence of a target, (2) a class (e.g., that the target is a religious group), or (3) a specific target group (e.g., Christians or Muslims). We define a target label hierarchy based on these three levels and then exploit this hierarchy in an adversarial correction for the lowest level (i.e. (3)) while maintaining some basic target features. This approach does not lower the toxicity detection performance but increases the generalization to targets not being available at training time.
Keywords:
Hierarchical Adversarial Correction
Peer Reviewed:
Yes:
International Distribution:
Yes:
Open Access Journal:
Yes:
Type:
Conferenceobject
Activation date:
November 10, 2025
Permalink
https://fis.uni-bamberg.de/handle/uniba/111086