Deriving Temporal Protoypes from Saliency Map Clusters for the Analysis of Deep-Learning-based Facial Action Unit Classification




Faculty/Professorship: Cognitive Systems  ; Explainable Machine Learning ; University of Bamberg  
Author(s): Finzel, Bettina  ; Kollmann, Rene; Rieger, Ines  ; Pahl, Jaspar ; Schmid, Ute  
Publisher Information: Bamberg : Otto-Friedrich-Universität
Year of publication: 2022
Pages: 86-97
Source/Other editions: Proceedings of the LWDA Workshops: FGWM, KDML, FGWI-BIA, and FGIR, 2993 (2021), S. 86-97
Year of first publication: 2021
Language(s): English
Licence: Creative Commons - CC BY - Attribution 4.0 International 
URN: urn:nbn:de:bvb:473-irb-553809
Abstract: 
Reliably determining the emotional state of a person is a difficult task for both humans as well as machines. Automatic detection and evaluation of facial expressions is particularly important if people are unable to express their emotional state themselves, for example due to cognitive impairments. Identifying the presence of Action Units in a human’s face is a psychologically validated approach of quantifying which emotion is expressed. To automate the detection process of Action Units Neural Networks have been trained. However, the black-box nature of Deep Neural Networks provides no insight on the relevant features identified during the decision process. Approaches of Explainable Artificial Intelligence have to be applied to provide an explanation why the network came to a certain conclusion. In this work "Layer-Wise Relevance Propagation" (LRP) in combination with the meta analysis approach "Spectral Relevance Analysis" (SpRAy) is used to derive temporal prototypes from predictions in video sequences. Temporal prototypes provide an aggregated view on the prediction of the network by grouping together similar frames by considering relevance. Additionally, a specific visualization method for temporal prototypes is presented that highlights the most relevant areas for a prediction of an Action Unit. A quantitative evaluation of our approach shows that temporal prototypes aggregate temporal information well. The proposed method can be used to generate concise visual explanations for a sequence of interpretable saliency maps. Based on the above, this work shall provide the foundation for a new temporal analysis method as well as an explanation approach that is supposed to help researchers and experts to gain a deeper understanding of how the underlying network decides which Action Units are active in a particular emotional state.
GND Keywords: Prototyp; Gesichtserkennung; Neuronales Netz; Cluster-Analyse; Affective Computing; Korrelationsanalyse
Keywords: Temporal Prototypes, Facial Action Unit Detection, Layer-wise Relevance Propagation, Spectral Clustering, Affective Computing, Correlation Analysis
DDC Classification: 004 Computer science  
RVK Classification: ST 301   
Type: Conferenceobject
URI: https://fis.uni-bamberg.de/handle/uniba/55380
Release Date: 12. September 2022

File SizeFormat  
fisba55380.pdf1.39 MBPDFView/Open