Explainable Machine Learning


SemesterWinter 2021
Course typeBlock Seminar
LecturerJun.-Prof. Dr. Wressnegger
AudienceInformatik Master & Bachelor
Credits4 ECTS
Room148, Building 50.34 and online
LanguageEnglish or German
RegistrationPlease register for the course in ILIAS

Remote Course

Due to the ongoing COVID-19 pandemic, this course is going to start off remotely, meaning, the kick-off meeting will happen online. The final colloquium, however, will hopefully be an in-person meeting again (<- This time we might indeed have a chance).

To receive all the necessary information, please subscribe to the mailing list here.


This seminar is concerned with explainable machine learning in computer security. Learning-based systems often are difficult to interpret, and their decisions are opaque to practitioners. This lack of transparency is a considerable problem in computer security, as black-box learning systems are hard to audit and protect from attacks.

The module introduces students to the emerging field of explainable machine learning and teaches them to work up results from recent research. To this end, the students will read up on a sub-field, prepare a seminar report, and present their work at the end of the term to their colleagues.

Topics cover different aspects of the explainability of machine learning methods for the application in computer security in particular.


Tue, 19. Oct, 14:00–15:30Primer on academic writing, assignment of topics
Thu, 28. OctArrange appointment with assistant
Mo, 1. Nov - Fr, 5. NovIndividual meetings with assistant
Wed, 1. DecSubmit final paper
Wed, 22. DecSubmit review for fellow students
Fri, 7. JanEnd of discussion phase
Fri, 21. JanSubmit camera-ready version of your paper
Thu, 10. FebPresentation at final colloquium

Mailing List

News about the seminar, potential updates to the schedule, and additional material are distributed using a separate mailing list. Moreover, the list enables students to discuss topics of the seminar.

You can subscribe here.


Every student may choose one of the following topics. For each of these, we additionally provide a recent top-tier publication that you should use as a starting point for your own research. For the seminar and your final report, you should not merely summarize that paper, but try to go beyond and arrive at your own conclusions.

Moreover, all of these papers come with open-source implementations. Play around with these and include the lessons learned in your report.

  • Counterfactual/Contrastive Explanations

    Every "why" question implicitly contains a contrast. Humans are not asking just "why?", we ask "why A rather than B?". The research field of counterfactual and contrastive explanation focuses on this alternative scenario B and especially how it can be generated.

    • Stepin et al., "A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence", IEEE Access 2021

  • Interactive Explanations

    Some authors claim that explanations for humans should be interactive. In an interactive dialog a human points to the part s/he wants to understand.

    • Wexler et al., "The What-If Tool: Interactive Probing of Machine Learning Models", IEEE TVCG 2020

  • Using User Studies to evaluate Explanations

    The seminar report should present approaches that use user studies to evaluate the quality of explanations. The limitations, problems, and results of such an evaluation is important.

    • Hendricks et al., "Grounding Visual Explanations”, ECCV 2018

  • Insights into Explainable Machine Learning from Philosophy and the Social Sciences

    It turns out there is a vast amount of philosophical work and papers produced by the social sciences on explanation and on how human provide and understand explanations. These lines of research already use a detailed taxonomy of explanations, causes of effects, and (human) behavior.

    • Van Bouwel and Weber, "Remote Causes, Bad Explanations?", Journal for the Theory of Social Behaviour 2002

  • Explainable Active Learning

    Explainable active learning is a novel paradigm that introduces XAI into an AL setting. The benefits are supporting trust calibration, enabling rich forms of teaching feedback, and potential drawbacks–anchoring effect with the model judgment and additional cognitive workload.

    • Teso, "Toward Faithful Explanatory Active Learning with Self-explainable Neural Nets", WIAL 2019

  • Concept Based Explanations

    Most of ML explanation methods revolved around importance to individual features or pixels, which suffers from several drawbacks. Therefore, multiple works focus on high level human concept-based explanations, which should be studies as part of this seminar topic.

    • Amirata et al., "Towards automatic concept-based explanations", NeurIPS 2019

  • Measuring the Quality of Explanations

    This seminar report should discuss and relate the possibilities for comparing explanations methods with each other.

    • Adebayo et al., "Sanity Checks for Saliency Maps", NeurIPS 2018

  • Attacking Explanations

    Similarly to adversarial examples (that attack the classifier), we can fool explanation methods. This includes showing useless/wrong explanations or showing a specific targeted explanation.

    • Zhang et al., "Interpretable Deep Learning under Fire", USENIX Security 2020