Human-centered Interactions with Text Classifiers : Fusing Concept-based Knowledge with Local Surrogate Explanation Models

Kiefer, SebastianSebastianKiefer0000-0002-1194-917X2023-10-092023-10-092023https://fis.uni-bamberg.de/handle/uniba/89771Kumulative Dissertation, Otto-Friedrich-Universität Bamberg, 2023Human interactions often take place with the aim of exchanging understanding between individuals. Thus, there is a human need to develop and communicate explanations on the one hand and to receive explanations on the other hand in order to further develop one’s own understanding. Explanations are an attempt to trace events back to their causes, for example, to provide answers to the question of "why". The concept of bidirectional explanations can also be applied to the interaction between humans and machines, for example, in the context of machine learning (ML), which is a sub-area of artificial intelligence (AI). Whenever machines are used to support human decision-making or even to provide recommendations for action, there is a need to be able to understand how the results were obtained and to influence them if necessary. Human decision-makers should develop trust in machine learners, especially in high-stakes application domains, like in medical diagnostics, but also in application domains that are strongly influenced by regulations, like the domain of financial auditing. The comprehension of concrete decisions and the incorporation of expert knowledge can contribute to the development of trust. Modern machine learning methods have recently been improved in terms of their predictive accuracy, which has led to them already surpassing human performance in some tasks. On the other hand, such powerful methods are usually so-called black-box-methods, what makes it difficult or even impossible for humans to develop an understanding of the general decision logic or the specific model behavior. Lack of comprehensibility not only impairs the formation of trust in machine companions, but also impairs the ability to interact in the sense of correctability. In comparison to global explanations that explain how a system works, local explanations that provide information about correlations of specific inputs and outputs and thus justify why a particular output was produced are often considered more reliable and purposeful. This fact especially holds true for non-expert AI users. Researchers in the field of explainable artificial intelligence have developed approaches that enable post hoc model-agnostic and mostly local explanations of supervised machine learners. The goal is to explain individual ML results retrospectively, i.e., after predictions have been made for previously unobserved instances, and independently of the ML model used, i.e., without knowledge about its inner workings. Especially in the case of surrogate explanation models that locally approximate the model to be explained, there is a risk that, when contextual information, such as dependencies between features, is neglected, this will result in data points that are described as "out-of-distribution". Explanations based on such unrealistic data points can easily be misinterpreted. Furthermore, explanation approaches are often developed exclusively from a technical point of view. Insights from psychology or the social sciences, according to which explanations that are understandable for humans should, for example, have a coherent structure, often are not taken into account. In this dissertation, a new approach is proposed and validated that enables human-centered interactivity with text classifiers by using bidirectional model-agnostic explanations. To this end, a framework is introduced that defines comprehensible and interactive artificial intelligence in an interdisciplinary manner using cognitive concepts such as explainability, interpretability, transparency, and interactivity. It elaborates that semantic and contextual information available within a certain textual application domain should be taken into account during the generation and representation of explanations such that contextual explanations result. In order to fill this identified research gap and obtain coherent explanations, a new technique is presented that generates model-agnostic concept-based explanations whose explanatory features consist of semantically related words. The consideration of context enables more realistic and meaningful local perturbation distributions, which form the basis of many model-agnostic and local explanation approaches, such as LIME. A technical evaluation of the new explanation methodology called topicLIME, in particular of the underlying surrogate models and the resulting explanations, analyzes its local fidelity related to the text classifier to be explained. Besides a technical evaluation, the obtained results are empirically analyzed by means of two user studies. It is investigated how concept-based contextual explanations in comparison to contextless explanations are perceived by humans interacting with a text classifier that predicts to which content category a presented document belongs. As another contribution of this dissertation, a method is presented that extends state-of-the-art for explanatory interactive learning, especially for text classification. The method intends to enable humans to engage in broader interactions, such as correcting predictions and explanations in a constructive and contextual manner. Previous model-agnostic methods for explanatory interactive learning, such as the CAIPI approach, only allow correcting a classifier regarding its functioning via explanations in case a correct prediction has been made for the wrong reasons. The newly developed Semantic Push approach qualifies humans to perform corrections via concept-based explanations and integrate them into the learner using non-extrapolating training documents, across different error types of a classifier. The evaluation of the approach shows that the newly developed method outperforms the baseline CAIPI method in terms of learning performance and local explanation quality. In summary, the fusion of concept-based knowledge with local surrogate explanation models is a promising research direction. The results of this dissertation further show that more interdisciplinary research is needed to address the challenges in the field of human-centered machine learning.engHuman-centered Machine LearningExplainable Artificial IntelligenceLocal Surrogate Explanation ModelsModel-agnostic ExplanationsExplanatory Interactive Machine LearningExplanatory UnderstandingHuman-like ExplanationsTopic Modeling004Human-centered Interactions with Text Classifiers : Fusing Concept-based Knowledge with Local Surrogate Explanation Modelsdoctoralthesisurn:nbn:de:bvb:473-irb-897715