1 Title

2 hE lwoann

3 hE lwoann

4 hE lwoann

5 Plan for today

6 Why do we need interpretability?

7 Why do we need interpretability? |

8 Why do we need interpretability?

9 Why do we need interpretability?

10 Why do we need interpretability?

11 Why do we need interpretability?

12 Why do we need interpretability?

13 Why do we need interpretability?

14 Why do we need interpretability?

15 Why do we need interpretability?

16 NOS Nieuws. Sport. Live Programma’s 2 Q @
clathodieose oe Genweg
17 Title

18 x I BoG =Q
Fr Nn dows Need ay ey
19 Title

20 Why do we need interpretability?

21 Why do we need interpretability?

22 egg
x Can we ever truly understand a large-scale Al model’s internal reasoning? vy | Wh
23 Why do we need interpretability?

24 How do we explain a model?

25 How do we explain a model?

26 How do we explain a model?

27 How do we explain a model?

28 How do we explain a model?

29 How do we explain a model?

30 How do we explain a model?

31 Explanation Faithfulness

32 Explanation Faithfulness

33 Explanation Faithfulness

34 Explanation Methods

35 Explanation Methods

36 Explanation Methods

37 Explanation Methods

38 Behavioural Interpretability

39 Behavioural Interpretability

40 BLIMP

41 BLIMP

42 BLIMP

43 BLIMP

44 BLIMP

45 BLIMP

46 BLIMP

47 BLIMP

48 Behavioural Tests for Uncovering Biases

49 Behavioural Tests for Uncovering Biases

50 Limitations of Behavioural Tests

51 Limitations of Behavioural Tests

52 Feature Attribution Methods

53 Pronoun Resolution

54 Pronoun Resolution

55 Pronoun Resolution

56 Pronoun Resolution

57 Pronoun Resolution

58 Pronoun Resolution

59 Averaae contributions

60 Averaae contributions

61 Averaae contributions

62 Averaae contributions

63 Default Reasoning?

64 Feature Attribution Methods

65 Feature Attribution Methods

66 Attribution Dimensions

67 Feature Removal

68 Feature Removal

69 Feature Removal

70 Feature Removal

71 Feature Removal

72 Feature Removal

73 Feature Removal

74 Featu re Removal Conditioned on present features |

75 Featu re Removal Conditioned on present features |

76 Feature Influence

77 Feature Influence

78 Shapley Values

79 Shapley Values

80 Shapley Values

81 Shapley Values

82 Feature Influence

83 Feature Influence

84 Highlighting via Input Gradients
e Estimate importance of a feature using derivative of output w.rt that feature
85 Example of highlighting: Image classification

86 Gradient-based Highlightings for NLP
For NLP, derivative of output w.r.t a feature
87 Gradient-based Highlightings for NLP
For NLP, derivative of output w.r.t a feature
88 Problems with Using Gradient for Highlighting
e 100 “local” and thus sensitive to slight perturbations
89 Problems with Using Gradient for Highlighting

90 Problems with Using Gradient for Highlighting

91 Extensions of Vanilla Gradient
e too “local” and thus sensitive to slight perturbations
92 Extensions of Vanilla Gradient
SmoothGrad: add gaussian noise to input and average the gradient
93 Extensions of Vanilla Gradient
Integrated Gradients: average gradients along path from zero to input
94 Summary of Gradient-based Highlighting
Positives:
95 Summary of Gradient-based Highlighting

96 Probing

97 Probing

98 Probing | Linauistic

99 Probing | os-tase NER etc. |

100 Representations

101 What does probed info imply?

102 Why linear?

103 K(A) = 1.60 K(s) = 0.19
Probing | POS-tags | S| 0] k@ets7 K(s) = 0.83
104 x
x] | Recap

105 References
