Donate or Create? : Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts

Bagdon, Christopher Doyle; Combs, Aidan; Silberer, Carina; Klinger, Roman

doi:10.18653/v1/2025.acl-long.847

Bagdon, Christopher Doyle; Combs, Aidan; Silberer, Carina; u. a. (2025): Donate or Create? : Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts, in: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, u. a. (Hrsg.), Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, S. 17307–17330, doi: 10.18653/v1/2025.acl-long.847.

Faculty/Chair:

Fundamentals of Natural Language Processing

Author:

Bagdon, Christopher Doyle

;

Combs, Aidan

;

Silberer, Carina

;

Klinger, Roman

Title of the compilation:

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics

Volume Number/Title:

1: Long Papers

Editors:

Che, Wanxiang

Nabende, Joyce

Shutova, Ekaterina

Pilehvar, Mohammad Taher

Conference:

63rd Annual Meeting of the Association for Computational Linguistics

Publisher Information:

Association for Computational Linguistics

Year of publication:

2025

Pages:

17307–17330

ISBN:

979-8-89176-251-0

Language:

English

DOI:

10.18653/v1/2025.acl-long.847

Abstract:

Accurate modeling of subjective phenomena such as emotion expression requires data annotated with authors’ intentions. Commonly such
data is collected by asking study participants to donate and label genuine content produced in the real world, or create content fitting particular labels during the study. Asking participants to create content is often simpler to implement and presents fewer risks to participant privacy than data donation. However, it is unclear if and how study-created content may differ from genuine content, and how differences may impact models. We collect study-created and genuine multimodal social media posts labeled for emotion and compare them on several dimensions, including model performance. We find that compared to genuine posts, study-created posts are longer, rely more on their text and less on their images for emotion expression, and focus more on emotion-prototypical events. The samples of participants willing to donate versus create posts are demographically different. Study-created data is valuable to train models that generalize well to genuine data, but realistic effectiveness estimates require genuine data.

Keywords:

-

Peer Reviewed:

Yes:

International Distribution:

Yes:

Open Access Journal:

Yes:

Type:

Conferenceobject

URI:

https://fis.uni-bamberg.de/handle/uniba/109544

Activation date:

August 7, 2025

Project(s):

User’s Choice of Images and Text to Express Emotions in Twitter and Reddit

Permalink https://fis.uni-bamberg.de/handle/uniba/109544

FIS

Versioning

Question on publication

Options

Versioning

Question on publication