Options
How Robust are Audio Embeddings for Polyphonic Sound Event Tagging?
Abeßer, Jakob; Grollmisch, Sascha; Müller, Meinard (2023): How Robust are Audio Embeddings for Polyphonic Sound Event Tagging?, in: IEEE ACM transactions on audio, speech, and language processing : TASLP, New York, NY: IEEE, Jg. 31, S. 2658–2667, doi: 10.1109/taslp.2023.3293032.
Faculty/Chair:
Author:
Title of the Journal:
IEEE ACM transactions on audio, speech, and language processing : TASLP
ISSN:
2329-9290
Publisher Information:
Year of publication:
2023
Volume:
31
Pages:
Language:
English
Abstract:
Sound classification algorithms are challenged by the natural variability of everyday sounds, particularly for large sound class taxonomies. In order to be applicable in real-life environments, such algorithms must also be able to handle polyphonic scenarios, where simultaneously occurring and overlapping sound events need to be classified. With the rapid progress of deep learning, several deep audio embeddings (DAEs) have been proposed as pre-trained feature representations for sound classification. In this article, we analyze the embedding spaces of two non-trainable audio representations (NTARs) and five DAEs for sound classification in polyphonic scenarios (sound event tagging) and make several contributions. First, we compare general properties like the inter-correlation between feature dimensions and the scattering of sound classes in the embedding spaces. Second, we test the robustness of the embeddings against several audio degradations and propose two sensitivity measures based on a class-agnostic and a class-centric view on the resulting drift in the embedding space. Finally, as a central contribution, we study how a blending between pairs of sounds maps to embedding space trajectories and how the path of these trajectories can cause classification errors due to their proximity to other sound classes. Throughout our analyses, the PANN embeddings have shown the best overall performance for low-polyphony sound event tagging.
Keywords: ; ; ;
sound event tagging
sound polyphony
deep audio embeddings
embedding space
Peer Reviewed:
Yes:
International Distribution:
Yes:
Type:
Article
Activation date:
November 24, 2025
Versioning
Question on publication
Permalink
https://fis.uni-bamberg.de/handle/uniba/111591