The Biostatistics Seminar Series presents:
“A Generalized Variable Importance Metric and it’s Estimator for Machine Learning Models” by Mohammad Khan, University of Toronto
Abstract: The aim of this study is to define importance of predictors for black box machine learning methods, where the prediction function can be complex and cannot be represented by statistical parameters. In this paper we defined a “Generalized Variable Importance Metric (GVIM)” using the true conditional expectation function for a continuous or a binary response variable. We further showed that the defined GVIM can be represented as a function of the Conditional Average Treatment Effect (CATE) for multinomial and continuous predictors. Then we propose how the metric can be estimated using any machine learning models. Finally using simulations, we evaluated the properties of the estimator when estimated from XGBoost, Random Forest and a mis-specified generalized additive model. While the estimators for the GVIM are consistent, they have finite sample biases. We investigated the source of the biases and propose some empirical solutions to minimize the bias. This research is going to significantly impact the public and clinical health sciences, since this opens the door for effectively using modern machine learning methods in real life applications in health sciences.
For Mohammad Khan’s biosketch, please see https://shorturl.at/kTY37