Interpretable machine learning through game-theoretical analysis of influencing variables and interaction effects
Most machine learning (ML) methods assume a representation of data entities in terms of a fixed set of variables (features). Analyzing the importance of individual variables for a model induced from the data, or for the prediction made with that model in a specific situation, facilitates the interpretation of ML models and supports subtasks such as feature selection. The goal of this project is to develop novel methods for feature analysis. In particular, we are interested in numerical measures quantifying the importance of individual variables in a machine learning task, i.e., the contribution of individual variables to the task at hand and predictions made by a model, as well as the interaction and dependencies between different variables.
This project aims at transparent modeling of dependencies and interactions between variables as well as theoretically justified reductions of nonlinear dependencies to understandable and interpretable measures for the influence of individual variables, both for supervised and unsupervised learning. A theoretical framework of that kind shall be complemented by suitable visualization techniques to further facilitate interpretation, the interaction with users and domain experts, and the exploration of dependencies in high-dimensional data spaces. All methods shall be evaluated in various case studies in the medical domain.
The main innovation of the project is to establish a connection between feature analysis and (cooperative) game theory. To this end, a feature subset is considered as a coalition in game theory, and the performance of a model trained with these features as the value of the coalition. The importance of individual variables can then be quantified in terms of well-established game-theoretical measures, such as the Shapley value, just like the interaction between pairs or groups of variables. Due to the exponential size of the lattice of feature subsets, the exact computation of such quantities is in general infeasible. Therefore, a key challenge of the project is the development of efficient approximate algorithms.
- 1.Schäfer D, Hüllermeier E. Dyad Ranking Using Plackett-Luce Models based on joint feature representations. Machine Learning. 2018;107:903–941.
- 2.Pfannschmidt K, Hüllermeier E, Held S, Neiger R. Evaluating Tests in Medical Diagnosis: Combining Machine Learning with Game-Theoretical Concepts. In: ; 2016:450-461. doi:10.1007/978-3-319-40596-4_38
- 3.Shekar A, Bocklisch T, Sánchez P, Straehle C, Müller E. Including Multi-feature Interactions and Redundancy for Feature Ranking in Mixed Datasets. In: ; 2017:239-255. doi:10.1007/978-3-319-71249-9_15
- 4.Nguyen H, Müller E, Vreeken J, Efros P, Böhm K. Multivariate maximal correlation analysis. ICML. 2014;3:775-783.