class: title-slide <br><br><br> # Transparency of # Machine Learning Models # in Credit Scoring ## CRC Conference XVI <br> ### Michael Bücker, Gero Szepannek, Przemyslaw Biecek, ### Alicja Gosiewska and Mateusz Staniak <br> #### 28 August 2019 --- layout: true <div class="my-footer"> Transparency of Machine Learning Models in Credict Scoring | Michael Bücker | CRC Converence XVI </div> --- class: empty background-image: url("https://media.giphy.com/media/pJOiRZcsxni5G/giphy.gif") background-size: contain --- class: inverse # Introduction --- # Introduction ### Michael Bücker Professor of Data Science at Münster School of Business <br> <br> <div align = "center"> <img src="img/intro.png" width=100%> </div> --- # Introduction + Main requirement for Credit Scoring models: provide a risk prediction that is **as accurate as possible** + In addition, regulators demand these models to be **transparent and auditable** + Therefore, very **simple predictive models** such as Logistic Regression or Decision Trees are still widely used (Lessmann, Baesens, Seow, and Thomas 2015; Bischl, Kühn, and Szepannek 2014) + Superior predictive power of modern **Machine Learning algorithms** cannot be fully leveraged + A lot of **potential is missed**, leading to higher reserves or more credit defaults (Szepannek 2017) --- # Research Approach + For an open data set we build a traditional and still state-of-the-art Score Card model + In addition, we build alternative Machine Learning Black Box models + We use model-agnostic methods for interpretable Machine Learning to showcase transparency of such models + For computations we use R and respective packages (Biecek 2018; Molnar, Bischl, and Casalicchio 2018) --- # The incumbent: Score Cards .pull-left70[ Steps for Score Card construction using Logistic Regression (Szepannek 2017) 1. Automatic binning 2. Manual binning 3. WOE/Dummy transformation 4. Variable shortlist selection 5. (Linear) modelling and automatic model selection 6. Manual model selection ] .pull-right70[ ] --- # The incumbent: Score Cards .pull-left70[ Steps for Score Card construction using Logistic Regression (Szepannek 2017) 1. Automatic binning 2. <a>Manual binning</a> 3. WOE/Dummy transformation 4. Variable shortlist selection 5. (Linear) modelling and automatic model selection 6. Manual model selection ] .pull-right70[ <div align = "center"> <img src="img/binning.png" width=120%> </div> ] --- # Score Cards: Manual binning .pull-left70[ Manual binning allows for + (univariate) non-linearity + (univariate) plausibility checks + integration of expert knowledge for binning of factors <a>...but: only univariate effects (!)</a> ] -- .pull-right70[ ... and means a lot of manual work <div align = "center"> <img src="https://media.giphy.com/media/RO5JLFmiHnN6g/giphy.gif" width=100%> </div> ] --- # The challenger models .pull-left[ We tested a couple of Machine Learning algorithms ... + Random Forests (`randomForest`) + Gradient Boosting (`gbm`) + XGBoost (`xgboost`) + Support Vector Machines (`svm`) + Logistic Regression with spline based transformations (`rms`) ] .pull-right[ ... and also two AutoML frameworks to beat the Score Card + [h2o AutoML](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html) (`h2o`) + [mljar.com](https://mljar.com/) (`mljar`) ] --- # Data set for study: xML Challenge by FICO .pull-left70[ + Explainable Machine Learning Challenge by FICO (2019) + Focus: Home Equity Line of Credit (HELOC) Dataset + Customers requested a credit line in the range of $5,000 - $150,000 + Task is to predict whether they will repay their HELOC account within 2 years + Number of observations: 2,615 + Variables: 23 covariates (mostly numeric) and 1 target variable (risk performance "good" or "bad") ] .pull-lright70[ <br> <a href = "https://community.fico.com/s/explainable-machine-learning-challenge"> <div align = "right"> <img src="img/FICO.png" width=25%> </div> <a> <br><br> <a href = "https://www.fico.com/"> <div align = "right"> <img src="img/FICO2.jpeg" width=25%> </div> <a> ] --- # Explainability of Machine Learning models .pull-left70[ There are many model-agnostic methods for interpretable ML today; see Molnar (2019) for a good overview. + Partial Dependence Plots (PDP) + Individual Conditional Expectation (ICE) + Accumulated Local Effects (ALE) + Feature Importance + Global Surrogate and Local Surrogate (LIME) + Shapley Values, SHAP + ... ] .pull-right70[ <div class = "wrap2"> <iframe class = "frame2" src="https://christophm.github.io/interpretable-ml-book/"></iframe> </div> ] --- # Implementation in R: DALEX .pull-left70[ <div align = "center"> <img src="img/dalex.png" width=100%> </div> ] .pull-right70[ + Descriptive mAchine Learning EXplanations + DALEX is a set of tools that help to understand how complex models are working ] --- class: inverse # Results: Model performance --- # Results: Comparison of model performance .pull-left70[ <img src="slides_files/figure-html/unnamed-chunk-1-1.png" width="1200" /> ] .pull-right70[ + Predictive power of the traditional Score Card model surprisingly good + Logistic Regression with spline based transformations best, using `rms` by Harrell Jr (2019) ] --- # Results: Comparison of model performance .pull-left70[ <br> <img src="slides_files/figure-html/unnamed-chunk-2-1.png" width="1000" /> ] .pull-right70[ For comparison of explainability, we choose + the Score Card, + a Gradient Boosting model with 10,000 trees, + a tuned Logistic Regression with splines using 13 variables ] --- class: inverse # Results: Global explanations --- # Score Card: Variable importance as range of points .pull-left70[ <img src="slides_files/figure-html/unnamed-chunk-3-1.png" width="1200" /> ] .pull-right70[ + Range of Score Card point as an indicator of relevance for predictions + Alternative: variance of Score Card points across applications ] --- # Model agnostic: Importance through drop-out loss .pull-left70[ <img src="slides_files/figure-html/unnamed-chunk-4-1.png" width="1200" /> ] .pull-right70[ + The drop in model performance (here AUC) is measured after permutation of a single variable + The more siginficant the drop in performance, the more important the variable ] --- # Score Card: Variable explanation based on points .pull-left70[ <br> <img src="slides_files/figure-html/unnamed-chunk-5-1.png" width="1200" /> ] .pull-right70[ + Score Card points for values of covariate show effect of single feature + Directly computed from coefficient estimates of the Logistic Regression ] --- # Model agnostic: Partial dependence plots .pull-left70[ <img src="slides_files/figure-html/unnamed-chunk-6-1.png" width="1200" /> ] .pull-right70[ + Partial dependence plots created with (Biecek 2018) + Interpretation very similar to marginal Score Card points ] --- class: inverse # Results: Local explanations --- # Instance-level explanations .pull-left70[ + Instance-level exploration helps to understand how a model yields a prediction for a single observation + Model-agnostic approaches are + additive Breakdowns + Shapley Values, SHAP + LIME + In Credit Scoring, this explanation makes each credit decision transparent ] .pull-right70[ ] --- # Score Card: Local explanations .pull-left70[ <img src="slides_files/figure-html/unnamed-chunk-7-1.png" width="1200" /> ] .pull-right70[ + Instance-level exploration for Score Cards can simply use individual Score Card points + This yields a breakdown of the scoring result by variable ] --- # Model agnostic: Variable contribution break down .pull-left70[ <img src="slides_files/figure-html/unnamed-chunk-8-1.png" width="1200" /> ] .pull-right70[ + Such instance-level explorations can also be performed in a model-agnostic way + Unfortunately, for non-additive models, variable contributions depend on the ordering of variables ] --- # Model agnostic: SHAP .pull-left70[ <img src="slides_files/figure-html/unnamed-chunk-9-1.png" width="1200" /> ] .pull-right70[ + Shapley attributions are averages across all (or at least large number) of different orderings + Violet boxplots show distributions for attributions for a selected variable, while length of the bar stands for an average attribution ] --- class: inverse # Conclusion --- # Modeldown: HTML summaries for predictive Models Rf. Biecek, Tatarynowicz, Romaszko, and Urbański (2019) .pull-left70[ <div class = "wrap"> <iframe class = "frame" src="https://buecker.netlify.com/modeldown"></iframe> </div> ] .pull-right70[ <a> <div align = "center"> <img src="img/qr.png" width=110%> </div> </a> ] --- # Conclusion + We have built models for Credit Scoring using Score Cards and Machine Learning + Predictive power of Machine Learning models was superior (in our example only slightly, other studies show clearer overperformance) + Model agnostic methods for interpretable Machine Learning are able to meet the degree of explainability of Score Cards and may even exceed it --- # References (1/3) Biecek, P. (2018). "DALEX: explainers for complex predictive models". In: _Journal of Machine Learning Research_ 19.84, pp. 1-5. Biecek, P, M. Tatarynowicz, K. Romaszko, and M. Urbański (2019). _modelDown: Make Static HTML Website for Predictive Models_. R package version 1.0.1. URL: [https://CRAN.R-project.org/package=modelDown](https://CRAN.R-project.org/package=modelDown). Bischl, B., T. Kühn, and G. Szepannek (2014). "On Class Imbalance Correction for Classification Algorithms in Credit Scoring". In: _Operations Research Proceedings_. Ed. by M. Löbbecke, A. Koster, L. P., M. R., P. B. and G. Walther. , pp. 37-43. FICO (2019). _xML Challenge_. Online. URL: [https://community.fico.com/s/explainable-machine-learning-challenge](https://community.fico.com/s/explainable-machine-learning-challenge). --- # References (2/3) Harrell Jr, F. E. (2019). _rms: Regression Modeling Strategies_. R package version 5.1-3.1. URL: [https://CRAN.R-project.org/package=rms](https://CRAN.R-project.org/package=rms). Lessmann, S, B. Baesens, H. Seow, and L. Thomas (2015). "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research". In: _European Journal of Operational Research_ 247.1, pp. 124-136. Molnar, C. (2019). _Interpretable Machine Learning. A Guide for Making Black Box Models Explainable_. <URL: https://christophm.github.io/interpretable-ml-book/>. Molnar, C, B. Bischl, and G. Casalicchio (2018). "iml: An R package for Interpretable Machine Learning". In: _Journal Of Statistical Software_ 3.26, p. 786. URL: [http://joss.theoj.org/papers/10.21105/joss.00786](http://joss.theoj.org/papers/10.21105/joss.00786). --- # References (3/3) Szepannek, G. (2017b). _A Framework for Scorecard Modelling using R_. CSCC 2017. Szepannek, G. (2017a). "On the Practical Relevance of Modern Machine Learning Algorithms for Credit Scoring Applications". In: _WIAS Report Series_ 29, pp. 88-96. --- # Thank you! <br> .smaller-font[ .pull-left[ #### Prof. Dr. Michael Bücker Professor of Data Science Münster School of Business FH Münster - University of Applied Sciences - Corrensstraße 25, Room C521 D-48149 Münster Tel: +49 251 83 65615 E-Mail: michael.buecker@fh-muenster.de http://prof.buecker.ms ] .pull-right[ <a> <div align = "center"> <img src="img/fh.jpg" width=100%> </div> </a> ] ] <!-- --- --> <!-- class: inverse --> <!-- # Backup --> <!-- --- --> <!-- class: empty --> <!-- background-image: url("https://media.giphy.com/media/lqVVqkqMolh9S/giphy.gif") --> <!-- background-size: contain -->