Hot Topics in Explainable Machine Learning and Artificial Intelligence (XAI)


SemesterSummer 2023
Course typeBlock Seminar
LecturerJun.-Prof. Dr. Wressnegger
AudienceInformatik Master & Bachelor
Credits4 ECTS
Room148, Building 50.34


This seminar is concerned with explainable machine learning in computer security. Learning-based systems often are difficult to interpret, and their decisions are opaque to practitioners. This lack of transparency is a considerable problem in computer security, as black-box learning systems are hard to audit and protect from attacks.

The module introduces students to the emerging field of explainable machine learning and teaches them to work up results from recent research. To this end, the students will read up on a sub-field, prepare a seminar report, and present their work at the end of the term to their colleagues.

Topics cover different aspects of the explainability of machine learning methods for the application in computer security in particular.


Tue, 18. April, 9:45–11:15Primer on academic writing, assignment of topics
Thu, 27. AprilArrange appointments with assistant
Tue, 02. May - Fri, 05. May1st individual meeting (First overview, ToC)
Mon, 05. June - Fri, 09. June2nd individual meeting (Feedback on first draft of the report)
Wed, 28. JuneSubmit final paper
Mon, 10. JulySubmit review for fellow students
Fri, 14. JulyEnd of discussion phase
Fri, 21. JulySubmit camera-ready version of your paper
Fri, 28. JulyPresentation at final colloquium

Matrix Chat

News about the seminar, potential updates to the schedule, and additional material are distributed using the course's matrix room. Moreover, matrix enables students to discuss topics and solution approaches.

You find the link to the matrix room on ILIAS.


Every student may choose one of the following topics. For each of these, we additionally provide two recent top-tier publications that you should use as a starting point for your own research. For the seminar and your final report, you should not merely summarize that paper, but try to go beyond and arrive at your own conclusions.

Moreover, all of these papers come with open-source implementations. Play around with these and include the lessons learned in your report.

  • Explanation for generative models

    Generative Adversarial Networks (GAN) and transformers have led to great advances in many tasks. However, the generative model interpretability is less explored compared to discriminative models. This topic would investigate, and taxonomize existing explanation methods for generative models, point out the difference between distinctive model and generative model explanations, and further discuss limitations.

    • Kong et al. NIPS21. "Understanding instance-based interpretability of variational auto-encoders."
    • Liu et al. CVPR 2020. "Towards visually explaining variational autoencoders."
    • Ali et al. PMLR 2022. "XAI for transformers: Better explanations through conservative propagation."

  • Post-hoc concept-based explanations

    Concept-based explanations characterize the global behaviour of a DNN with high-level human-understandable concepts. A few recent studies have proposed methods to discover the post-hoc concept-based explanations of trained models based on different assumptions. This work would summarize existing post-hoc methods, discuss their limitations and point out the relations of concept-based explanations to other human interpretable representations.

    • Amirata et al. NeurIPS 2019 "Towards automatic concept-based explanations"
    • Yeh Chih-Kuan et al. NeurIPS 2020 "On completeness-aware concept-based explanations in deep neural networks."
    • Jonathan et al. NeurIPS 2022 "Concept Activation Regions: A Generalized Framework For Concept-Based Explanations."

  • Knowledge-graph-based XAI

    Knowledge-graph (KGs) have been widely applied in various domains for different purposes. It can help machine learning systems to be more explainable and interpretable. The topic would systematize the current knowledge-graph-based explanations and describe the application domains and further discuss the challenges.

    • Halliwell, Nicholas, AAAI 2022 "Evaluating Explanations of Relational Graph Convolutional Network Link Predictions on Knowledge Graphs."
    • Chan, Aaron, et al. NIPS 2021 "Salkg: Learning from knowledge graph explanations for commonsense reasoning."

  • Measuring the Quality of Explanations

    XAI helps to understand which input features contribute to a neural networks output. A variety of explanation methods have been proposed in recent years and more often multiple explanation methods point to different input features. How to choose the best method for a given task? How can we measure the quality of explanations?

    • Adebayo et al. NeurIPS 2018 "Sanity Checks for Saliency Maps."
    • Arras, Osman and Samek, Information Fusion 2022 "CLEVR-XAI: A benchmark dataset for the ground truth evaluation of neural network explanations"

  • Adversarial robustness and Explainability

    Adversarial learning is the method to make machine learning models robust against attacks. Still, the decisions of the model need to be explainable to make the model trustworthy. How does adversarial learning affect explainability? Are robust models easy to explain and understand? Can explainability be used for making the models robust?

    • Augustin et al. ECCV 2020 "Adversarial Robustness on In- and Out-Distribution Improves Explainability"
    • Chalasani et al. ICML 2020 "Concise Explanations of Neural Networks using Adversarial Training."

  • XAI Aspects of Fairwashing

    Explainable systems are deployed to support transparency, fairness, and trust in AI. However, recent works propose to fairwash the system. Here, the adversarial aim is to obtain a seemingly fair system to pass auditing or validation processes, even if the system shows unfair behaviour and thus its deployment is highly questionable.

    • Aïvodji et al. ICML 2029. "Fairwashing: the risk of rationalization."
    • Lakkaraju and Bastani AIES 2020. "'How do I fool you?': Manipulating user trust via misleading black box explanations."
    • Shamsabadi et al. NeurIPS 2022. "Washing The Unwashable: On The (Im)possibility of Fairwashing Detection."

  • Attacking Explanations through Input Manipulations

    It turns that similar to adversarial examples, which attack the classification result, we can fool explanation. Discuss the mean-spirited goals of attacking explanations by outlining general approaches. How can these attacks be defended and how to measure the robustness of XAI methods?

    • Zhang et al. USENIX 2020. “Interpretable Deep Learning under Fire.”
    • Dombrowski et al. NeurIPS 2019. “Explanations Can Be Manipulated and Geometry Is to Blame.”

  • Evaluating the robustness of black box explanation methods

    Black box explanations of a machine learning model are generated without using its internal parameters and model-specific characteristics. Are model-agnostic black box explanation methods trustworthy? How do they compare against white box explanation methods in terms of robustness?

    • Ribeiro et al. KDD 2016 "'Why Should I Trust You?' Explaining the Predictions of Any Classifier"
    • Slack et al. AIES 2020 "Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods"
    • Lakkaraju and Bastani AIES 2020. "'How do I fool you?': Manipulating user trust via misleading black box explanations."