Transparency of Machine Learning Models in Credit Scoring

<br><br><br>
# Transparency of 
# Machine Learning Models 
# in Credit Scoring

## CRC Conference XVI

<br> 
### Michael Bücker, Gero Szepannek, Przemyslaw Biecek, 
### Alicja Gosiewska and Mateusz Staniak

<br> 
#### 28 August 2019
---

<div class="my-footer"> Transparency of Machine Learning Models in Credict Scoring | Michael Bücker | CRC Converence XVI </div>

---

class: empty
background-image: url("https://media.giphy.com/media/pJOiRZcsxni5G/giphy.gif")
background-size: contain

---

---

# Introduction

### Michael Bücker
Professor of Data Science  at Münster School of Business  
<br>

---

# Introduction

+ Main requirement for Credit Scoring models: provide a risk prediction that is **as accurate as possible**
+ In addition, regulators demand these models to be **transparent and auditable**
+ Therefore, very **simple predictive models** such as Logistic Regression or Decision Trees are still widely used (Lessmann, Baesens, Seow, and Thomas 2015; Bischl, Kühn, and Szepannek 2014) 
+ Superior predictive power of modern **Machine Learning algorithms** cannot be fully leveraged
+ A lot of **potential is missed**, leading to higher reserves or more credit defaults  (Szepannek 2017)

---

# Research Approach

+ For an open data set we build a traditional and still state-of-the-art Score Card model 
+ In addition, we build alternative Machine Learning Black Box models
+ We  use model-agnostic methods for interpretable Machine Learning to showcase transparency of such models
+ For computations we use R and respective packages 
(Biecek 2018; Molnar, Bischl, and Casalicchio 2018)

---

# The incumbent: Score Cards

1. Automatic binning
2. Manual binning
3. WOE/Dummy transformation
4. Variable shortlist selection 
5. (Linear) modelling and automatic model selection
6. Manual model selection 
]

]
---

# The incumbent: Score Cards

1. Automatic binning
2. <a>Manual binning</a>
3. WOE/Dummy transformation
4. Variable shortlist selection 
5. (Linear) modelling and automatic model selection
6. Manual model selection 
]

# Score Cards: Manual binning

+  (univariate) non-linearity
+  (univariate) plausibility checks
+  integration of expert knowledge for binning of factors

<a>...but: only univariate effects (!)</a>

]

--
.pull-right70[
... and means a lot of manual work
<div align = "center">
<img src="https://media.giphy.com/media/RO5JLFmiHnN6g/giphy.gif" width=100%>
</div>
]

---
# The challenger models

+ Random Forests (`randomForest`)
+ Gradient Boosting (`gbm`)
+ XGBoost (`xgboost`)
+ Support Vector Machines (`svm`)
+ Logistic Regression with spline based transformations (`rms`)
]

+ [h2o AutoML](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html) (`h2o`)
+ [mljar.com](https://mljar.com/) (`mljar`)
]
---
# Data set for study: xML Challenge by FICO

.pull-left70[
+ Explainable Machine Learning Challenge by FICO (2019) 
+ Focus: Home Equity Line of Credit (HELOC) Dataset
+ Customers  requested a credit line in the range of $5,000 - $150,000
+ Task is to predict whether they will repay their HELOC account within 2 years
+ Number of observations: 2,615
+ Variables: 23 covariates (mostly numeric) and 1 target variable (risk performance "good" or "bad")
]

<br>
<a href = "https://community.fico.com/s/explainable-machine-learning-challenge">
<div align = "right">
<img src="img/FICO.png" width=25%>
</div>
<a>
<br><br>
<a href = "https://www.fico.com/">
<div align = "right">
<img src="img/FICO2.jpeg" width=25%>
</div>
<a>
]

---

# Explainability of Machine Learning models

.pull-left70[
There are many model-agnostic methods for interpretable ML today; see Molnar (2019)  for a good overview.

+ Partial Dependence Plots (PDP)
+ Individual Conditional Expectation (ICE)
+ Accumulated Local Effects (ALE)
+ Feature Importance
+ Global Surrogate and Local Surrogate (LIME)
+ Shapley Values, SHAP
+ ...
]

.pull-right70[
<div class = "wrap2">
<iframe class = "frame2"  src="https://christophm.github.io/interpretable-ml-book/"></iframe>
</div>
]
---

# Implementation in R: DALEX

.pull-right70[
+ Descriptive mAchine Learning EXplanations
+ DALEX is a set of tools that help to understand how complex models are working
]

---
class: inverse
# Results: Model performance

---

# Results: Comparison of model performance

.pull-right70[
+ Predictive power of the traditional Score Card model surprisingly good
+ Logistic Regression with spline based transformations best, using `rms` by Harrell Jr (2019) 
]
---

# Results: Comparison of model performance

+ the Score Card, 
+ a Gradient Boosting model with 10,000 trees, 
+  a tuned Logistic Regression with splines using 13 variables
]

---
class: inverse
# Results: Global explanations

---

# Score Card: Variable importance as range of points
.pull-left70[
<img src="slides_files/figure-html/unnamed-chunk-3-1.png" width="1200" />
]

+ Range of Score Card point as an indicator of relevance for predictions
+ Alternative: variance of Score Card points across applications

]
---
# Model agnostic: Importance through drop-out loss

+ The drop in model performance (here AUC) is measured after permutation of a single variable
+ The more siginficant the drop in performance, the more important the variable

]

---
# Score Card: Variable explanation based on points

+ Score Card points for values of covariate show effect of single feature
+ Directly computed from coefficient estimates of the Logistic Regression

]

---
# Model agnostic: Partial dependence plots

+ Partial dependence plots created with (Biecek 2018) 
+ Interpretation very similar to marginal Score Card points

]

---
class: inverse
# Results: Local explanations

---
# Instance-level explanations

.pull-left70[
+ Instance-level exploration helps to understand how a model yields a prediction for a single observation
+ Model-agnostic approaches are 
  + additive Breakdowns
  + Shapley Values, SHAP
  + LIME
+ In Credit Scoring, this explanation makes each credit decision transparent
]

.pull-right70[
+ Instance-level exploration for Score Cards can simply use individual Score Card points 
+ This yields a breakdown of the scoring result by variable

]

---

# Model agnostic: Variable contribution break down

.pull-right70[
+ Such instance-level explorations can also be performed in a model-agnostic way
+ Unfortunately, for non-additive models, variable contributions depend on the ordering of variables
]

---

# Model agnostic: SHAP

.pull-right70[
+ Shapley attributions are averages across all (or at least large number) of different orderings
+ Violet boxplots show distributions for attributions for a selected variable, while length of the bar stands for an average attribution
]

---
class: inverse
# Conclusion

---

# Modeldown: HTML summaries for predictive Models

Rf. Biecek, Tatarynowicz, Romaszko, and Urbański (2019)

.pull-left70[
<div class = "wrap">
<iframe class = "frame"  src="https://buecker.netlify.com/modeldown"></iframe>
</div>
]

# Conclusion

+ We have built models for Credit Scoring using Score Cards and Machine Learning
+ Predictive power of Machine Learning models was superior (in our example only slightly, other studies show clearer overperformance)
+ Model agnostic methods for interpretable Machine Learning are able to meet the degree of explainability of Score Cards and may even exceed it

---
# References (1/3)

Biecek, P. (2018). "DALEX: explainers for complex predictive
models". In: _Journal of Machine Learning Research_ 19.84, pp.
1-5.

Biecek, P, M. Tatarynowicz, K. Romaszko, and M. Urbański (2019).
_modelDown: Make Static HTML Website for Predictive Models_. R
package version 1.0.1. URL:
[https://CRAN.R-project.org/package=modelDown](https://CRAN.R-project.org/package=modelDown).

Bischl, B., T. Kühn, and G. Szepannek (2014). "On Class Imbalance
Correction for Classification Algorithms in Credit Scoring". In:
_Operations Research Proceedings_. Ed. by M. Löbbecke, A. Koster,
L. P., M. R., P. B. and G. Walther. , pp. 37-43.

FICO (2019). _xML Challenge_. Online. URL:
[https://community.fico.com/s/explainable-machine-learning-challenge](https://community.fico.com/s/explainable-machine-learning-challenge).

---

# References (2/3)

Harrell Jr, F. E. (2019). _rms: Regression Modeling Strategies_. R
package version 5.1-3.1. URL:
[https://CRAN.R-project.org/package=rms](https://CRAN.R-project.org/package=rms).

Lessmann, S, B. Baesens, H. Seow, and L. Thomas (2015).
"Benchmarking state-of-the-art classification algorithms for
credit scoring: An update of research". In: _European Journal of
Operational Research_ 247.1, pp. 124-136.

Molnar, C. (2019). _Interpretable Machine Learning. A Guide for
Making Black Box Models Explainable_. <URL:
https://christophm.github.io/interpretable-ml-book/>.

Molnar, C, B. Bischl, and G. Casalicchio (2018). "iml: An R
package for Interpretable Machine Learning". In: _Journal Of
Statistical Software_ 3.26, p. 786. URL:
[http://joss.theoj.org/papers/10.21105/joss.00786](http://joss.theoj.org/papers/10.21105/joss.00786).

---

# References (3/3)

Szepannek, G. (2017b). _A Framework for Scorecard Modelling using
R_. CSCC 2017.

Szepannek, G. (2017a). "On the Practical Relevance of Modern
Machine Learning Algorithms for Credit Scoring Applications". In:
_WIAS Report Series_ 29, pp. 88-96.

---

# Thank you!

<br>

.smaller-font[
.pull-left[
#### Prof. Dr. Michael Bücker  
Professor of Data Science  
Münster School of Business

FH Münster - University of Applied Sciences -  
Corrensstraße 25, Room C521  
D-48149 Münster

Tel: +49 251 83 65615  
E-Mail: michael.buecker@fh-muenster.de  
http://prof.buecker.ms
]