Exploring the Data Landscape through the Lens of the Shapley Value

We live in a world where vast amounts of data are produced and collected every day. One important question is how we can make use of this data and what contribution each part of the data makes.

Imagine, for example, a smart home system equipped with a variety of sensors to monitor different aspects of the living environment. These could include room temperature, humidity, air quality, and light intensity. These sensors can be used to save energy, enhance security, or improve the comfort of the residents.

Now, suppose a power outage occurs, and we suddenly need to turn off some of the sensors while still keeping the most important functions running.

It’s not easy to decide which sensors should remain active. This is where the Shapley value comes into play. The Shapley value is a concept that helps us make fair decisions when dealing with multiple data sources. It helps us understand the contribution of each sensor, regardless of what the data is being used for.

Where does the Shapley value come from?

It comes from something called “cooperative game theory,” a branch of mathematics that looks at what happens when different people or groups work together—making decisions that influence each other. The Shapley value measures each player’s contribution in a group and shows how important each player is to the final outcome when they collaborate.

This idea is so flexible that it can be applied to many different situations. It’s also relatively easy to implement and can help us better understand systems, detect errors, and build trust in intelligent applications.

However, there are challenges: calculating the Shapley value exactly can take a lot of time and resources. For example, in the case of genetic data—with hundreds or even thousands of features—the number of calculations needed could be greater than the number of atoms in the universe!

That’s why we are working on methods to approximate the Shapley value: we aim to quickly provide an estimate that’s close to the true value, without requiring so much computational power.

Additional resources

TBA

Cooperation

Associated with Prof. Dr. Eyke Hüllermeier, LMU München

Project Publications

  • Balestra, Chiara, Florian Huber, Andreas Mayr, and Emmanuel Müller (2022). ‘‘Unsupervised Features Ranking via Coalitional Game Theory for Categorical Data’’. In: Big Data Analytics and Knowledge Discovery – 24th International Conference, DaWaK 2022, Vienna, Austria, August 22-24, 2022, Proceedings. Ed. by Robert Wrembel, Johann Gamper, Gabriele Kotsis, A Min Tjoa, and Ismail Khalil. Vol. 13428. Lecture Notes in Computer Science. Springer, pp. 97–111. doi: 10.1007/978-3-031-12670-3\_9.
  • Balestra, Chiara, Bin Li, and Emmanuel Müller (2023a). ‘‘On the Consistency and Robustness of Saliency Explanations for Time Series Classification’’. In: arXiv preprint arXiv:2309.01457.
  • Balestra, Chiara, Bin Li, and Emmanuel Müller (2023b). ‘‘slidSHAPs–sliding Shapley Values for correlation-based change detection in time series’’. In: 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, pp. 1–10.
  • Balestra, Chiara, Carlo Maj, Emmanuel Müller, and Andreas Mayr (2022). ‘‘Redundancy-aware unsupervised ranking based on game theory – application to gene enrichment analysis’’. In: CoRR abs/2207.12184. doi: 10.48550/arXiv.2207.12184.
  • Balestra, Chiara, Carlo Maj, Emmanuel Müller, and Andreas Mayr (2023). ‘‘Redundancy-aware unsupervised ranking based on game theory: Ranking pathways in collections of gene sets’’. In: Plos one 18.3, e0282699.
  • Fumagalli, Fabian, Maximilian Muschalik, Patrick Kolpaczki, Eyke Hüllermeier, and Barbara Hammer (2024). ‘‘SHAP-IQ: Unified approximation of any- order shapley interactions’’. In: Advances in Neural Information Processing Systems 36.
  • Kolpaczki, Patrick, Viktor Bengs, and Eyke Hüllermeier (2021). ‘‘Identifying Top-k Players in Cooperative Games via Shapley Bandits’’. In: Proceedings of the LWDA 2021 Workshops. Vol. 2993, pp. 133–144.
  • Kolpaczki, Patrick, Viktor Bengs, and Eyke Hüllermeier (2022). ‘‘Non-Stationary Dueling Bandits’’. In: CoRR abs/2202.00935.
  • Kolpaczki, Patrick, Viktor Bengs, Maximilian Muschalik, and Eyke Hüllermeier (2024). ‘‘Approximating the shapley value without marginal contributions’’. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. 12, pp. 13246–13255.
  • Li, Bin, Chiara Balestra, and Emmanuel Müller (2022). ‘‘Enabling the Visualization of Distributional Shift using Shapley Values’’. In: NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications. url: https://openreview.net/forum?id=HxnGNo2ADT.