The faces behind project (RL)3

Raphael Engelhardt

Technische Hochschule Köln

Moritz Lange

Ruhr-University Bochum

Beyond Black Boxes:
Interpretable RL through Representations, XAI, and Reasoning

Modern reinforcement learning (RL) agents use neural networks to predict their best next ac- tion given the current state of their environment. Environments often have complex dynamics and their states are represented as high-dimensional raw data such as images. The neural net- works that process these states must therefore model complex mathematical functions, and are notorious for becoming inaccessible black boxes. As a result, agent decisions are uninter- pretable and potentially unpredictable, especially in scenarios that the agent did not experi- ence during training.

In the RL3 project, we have been exploring ways to address these issues and make RL more interpretable, reliable, and thus trustworthy. Our first line of research was to see which data representations could provide a basis for decision-making that’s both expressive for an RL agent and interpretable for a human user. The second was how to use methods from explain- able artificial intelligence (XAI) to make the decisions of modern RL agents understandable. The third and final one was to explore ways of designing inherently understandable agents that are capable of reasoning about their decisions, rather than just using end-to-end neural networks.

In this article we give an overview of our research projects and results in all three areas. We also provide references to our publications.

Foundations of Clarity: Building Interpretable Data Representations for RL

The first step in making good and transparent decisions is to have an accessible basis – good data – on which to decide. In the case of RL agents, a well-chosen, interpretable representation of its current state not only makes it easier for the agent to choose an action, but also makes its choice more transparent and understandable to human users. In the real world, however, sensor and camera data is often high-dimensional and unprocessed. In two projects, we explore methods of representation learning that can improve the quality and interpretability of data representations for RL.

Auxiliary tasks for representation learning

While there are several surveys of representation learning in RL [12, 13], few works empirically compare methods [14, 15], and these only consider visual data. However, many RL environments provide non-visual data, such as sensor readings in factory production lines. We addressed this lack of empirical comparison by comparing common auxiliary tasks used for representation learning in RL (these are tasks other than maximising an agent’s reward, e.g. compressing and reconstructing the current state with an autoencoder) on a variety of different non-visual benchmark RL environments [1].

Our results show that representations trained on the task of predicting future states, i.e. representations that can simplify the prediction of the temporal evolution of agent and environment, are the most powerful. In particular, their usefulness increases with the complexity of an environment, and they can make tasks feasible that were otherwise infeasible for state- of-the-art deep RL agents.

Representations for visual navigation

For visual data, we have explored interpretable representations for navigation tasks. The slow feature analysis (SFA) method, developed in our group and inspired by neuroscience research, is able to extract an agent’s position and heading from a first-person camera view [16]; GitHub. On the one hand, position and heading are a more informative and concise representation than pixels, and on the other hand they are also more suitable for modelling temporal evolution, as these (aptly named) slow features change much slower than pixels in a video. We have combined SFA with state-of-the-art deep RL agents and have shown that, under the right circumstances, it outperforms the usual approach of processing an agent’s camera view with convolutional neural networks [2].

Shedding Light: Using XAI to Explain Deep RL Decisions

Once the agent has a good, interpretable basis for making a decision, the next step is to make the decision process itself understandable. In state-of-the-art deep reinforcement learning (DRL), neural networks process the state of the environment and predict the most profitable next action. The XAI research area provides methods to make the complex functions learned by neural networks accessible to users. Decision trees, rule learning, and Shapley values are widely used approaches in XAI. We use them to explain the decisions of DRL agents.

Decision Trees as Surrogate Models

Our approach is based on the idea of translating the RL problem into a supervised learning one. By observing the DRL agent navigate the RL environment while logging its actions and the environment’s state, we obtain a set of structured data. From these data, a decision tree (DT) can be distilled, which reconstructs the DRL agent’s behavior as a surrogate model [3–5].

We developed an iterative algorithm, which alternatingly leverages the exploration of DTs and exploitation of the DRL agent’s predictive power to obtain gradually refined datasets from which, in turn, increasingly better-performing DTs can be distilled. In [6] we showed for a variety of continuous control challenges from the gymnasium benchmark suite [17], how this method allowed us to build DTs of limited depth, which successfully master the task, often even surpassing the DRL agents’ performance. Given their structure of hierarchic if-then-rules, the DTs are intrinsically traceable by humans and contain orders of magnitude fewer parameters than the DRL networks.

By successfully applying the method to a real-world robotic task, we could demonstrate how the developed technique stands the test of not perfectly controlled conditions and the presence of noise, both inherent to practical applications.

Shapley Values for Explainable RL

In addition or as an alternative to surrogate models, we applied a technique from game theory to RL. Shapley values [18, 19] offer a mathematically proven way of distributing the outcome of a collaborative game to the individual players. Translated to a DRL agent’s policy, it allows estimating how much each element of the agent’s perception of the environment contributed how much to its decision of the next action.

In our experiments on more complex RL tasks of directed locomotion of simulated robots (MuJoCo benchmark [20]), we could prove the robustness and applicability of Shapley values to RL and attribute a measure of the overall importance of individual state variables for the decision-making process in dynamic environment of multidimensional action spaces [7]. This allows users to develop a better understanding of DRL agents.

Beyond Deep RL: Towards Reasoning Agents

The complex functions learned by neural networks in state-of-the-art reinforcement learning RL agents are inherently difficult to interpret. While XAI methods can provide simplified, interpretable approximations of these functions, such transparent policies often lack the complexity required to perform well in challenging RL tasks.

Humans, on the other hand, can use reasoning to explain and understand even complex decisions such as ”why did I open that drawer” in the context of a larger task. The future of interpretable RL agents will involve similar reasoning systems. These systems may still use neural networks due to the complexity of most tasks. But similar to the human brain, the neural networks will break down decision-making into individual, disentangled reasoning steps [8].

Causal Reasoning

One promising line of reasoning, again inspired by human problem-solving, is causal reasoning. Most machine learning approaches are concerned with finding correlation rather than causation. An important reason for this is that understanding causality requires the ability to intervene in the data generation process [21, 22], something that’s usually not possible when using existing datasets in traditional machine learning. In RL, on the other hand, an agent naturally has the ability to explore, act and intervene. It can try action A and then try action B in the same situation and observe the different results. This makes RL particularly well suited to causal reasoning. The improvements in robustness, generalisability and planning that causal understanding brings, in turn, make causal reasoning promising for RL (for an overview of causal RL, see [23]).

We have begun to work with recent gradient-based (rather than constraint-based) causal discovery methods and, before even applying them to RL, have found that they do not necessarily always learn the correct causal relationships. Instead, some methods exploit distributional biases that often, but not always, correlate with causal relationships. We are in the process of publishing these findings. Our hope is to eventually use gradient-based causal discovery methods to learn causal models of RL environments.

Physical Reasoning

Besides causal reasoning, physical reasoning – the understanding and use of physical laws – is an important line of research for reasoning RL. Most RL environments simulate real-world physics in some way. To this end, we have written a comprehensive survey of benchmarks and the state of physical reasoning research in AI in general [9].

Beyond this survey, we are developing denoising diffusion generative models to infer phys- ical trajectories that obey the laws of Newtonian physics. A forward model of these dynam- ics can be achieved with good old partial differential equations, or alternatively with modern transformer models such as [24]. In contrast, we use denoising diffusion models because of their untapped potential to take into account conditions (e.g. object positions) at different times simultaneously. With the right architecture and the ability to model object interactions, this becomes very powerful: The model is then able to choose or plan actions, given other ob- jects that it cannot control, and given a desired future state. To the best of our knowledge, denoising diffusion has never been applied in this context. Although we have promising initial results, this is still a work in progress.

Summary

In the RL3 project, we have advanced the fields of representation learning for RL, XAI in RL, and reasoning in RL. In all these fields, we have contributed to making RL agents more interpretable, reliable, and trustworthy.

We have shown how meaningful representations, optimised for understanding the dynamics of an environment, can make agents perform better or even succeed at a previously intractable task [1]. These representations are often not only more powerful, but also more interpretable, for example when extracting agent location and heading from visual observations [2].

We have also shown how decision trees and their decision rules can be used as interpretable surrogate models for RL agents based on neural networks [5]. In particular, we have shown how an iterative training procedure makes the supervised technique of decision trees applicable to the interactive and cumulative setup of RL [6]. In addition to decision trees, we show how Shapley values can be used in the context of RL to explain the importance of individual state variables for a given agent decision [7].

Finally, we have begun to work towards RL agents that can reason about their actions in a human-like fashion. We found that recent scalable gradient-based causal discovery methods have certain shortcomings that prevent us from successfully applying them to RL for now, although we still consider this a promising area of research. We also wrote a survey on physical reasoning and its benchmarks, another promising area for reasoning in RL [9]. Based on this survey, we have started work on denoising diffusion models for physical reasoning, which can later be applied as novel ways of reasoning in physics-based RL environments.

Cooperation

Project Publications

[1]  M. Lange, N. Krystiniak, R. C. Engelhardt, W. Konen, and L. Wiskott, “Improving Reinforcement Learning Efficiency with Auxiliary Tasks in Non-visual Environments: A Comparison,” in Lecture Notes in Computer Science, Machine Learning, Optimization, and Data Science (LOD 2023), (Grasmere, United Kingdom, Sep. 22–26, 2023), G. Nicosia, V. Ojha, E. La Malfa, G. La Malfa, P. M. Pardalos, and R. Umeton, Eds., vol. 14506, Cham, Switzerland: Springer Nature, 2024, pp. 177–191, ISBN: 978-3-031-53966-4. DOI: 10.1007/978-3-031-53966-4_14.

[2]  M. Lange, R. C. Engelhardt, W. Konen, and L. Wiskott, “Interpretable Brain-Inspired Representations Improve RL Performance on Visual Navigation Tasks,” in The 38th Annual AAAI Conference on Artificial Intelligence, Workshop eXplainable AI approaches for Deep Reinforcement Learning, (Vancouver, Canada, Feb. 20–27, 2024), 2024. Available: https://openreview.net/forum?id=s1oVgaZ3dQ (visited on Nov. 19, 2024).

[4]  R. C. Engelhardt, M. Lange, L. Wiskott, and W. Konen, “Finding the Relevant Samples for Decision Trees in Reinforcement Learning,” in Online Proceedings of the Dataninja Spring School 2023, (Bielefeld, Germany, May 8–10, 2023), 2023. Available: https: //dataninja.nrw/?page_id=1251 (visited on Nov. 19, 2024).

[5]  R. C. Engelhardt, M. Lange, L. Wiskott, and W. Konen, “Sample-Based Rule Extraction for Explainable Reinforcement Learning,” in Lecture Notes in Computer Science, Machine Learning, Optimization, and Data Science (LOD 2022), (Certosa di Pontignano, Italy, Sep. 19–22, 2022), G. Nicosia et al., Eds., vol. 13810, Cham, Switzerland: Springer Nature, 2023, pp. 330–345, ISBN: 978-3-031-25599-1. DOI: 10.1007/978-3-031- 25599-1_25.

[6]  R. C. Engelhardt, M. Oedingen, M. Lange, L. Wiskott, and W. Konen, “Iterative Oblique Decision Trees Deliver Explainable RL Models,” Algorithms, vol. 16, no. 6, p. 282, 2023, Advancements in Reinforcement Learning Algorithms, ISSN: 1999-4893. DOI: 10.3390/ a16060282. Available: https://www.mdpi.com/1999-4893/16/6/282 (visited on Jul. 4, 2024).

[7]  R. C. Engelhardt, M. Lange, L. Wiskott, and W. Konen, “Exploring the Reliability of SHAP Values in Reinforcement Learning,” in Communications in Computer and Information Science, Explainable Artificial Intelligence (xAI 2024), (Valletta, Malta, Jul. 17–19, 2024), L. Longo, S. Lapuschkin, and C. Seifert, Eds., vol. 2155, Cham, Switzerland: Springer Nature, 2024, pp. 165–184, ISBN: 978-3-031-63800-8. DOI: 10.1007/978-3-031- 63800-8_9.

[8]  M. Lange, R. C. Engelhardt, W. Konen, and L. Wiskott, “Beyond Trial and Error in Reinforcement Learning,” in DataNinja sAIOnARA 2024 Conference, (Bielefeld, Germany, Jun. 25–27, 2024), U. Kuhl, Ed., 2024. DOI: 10.11576/dataninja-1172. Available: https://biecoll.ub.uni-bielefeld.de/index.php/dataninja/issue/view/82 (visited on Nov. 19, 2024).

[9]  A. Melnik et al., “Benchmarks for physical reasoning AI,” Transactions on Machine Learning Research, 2023, ISSN: 2835-8856. Available: https://openreview.net/forum? id=cHroS8VIyN (visited on Nov. 19, 2024).

[10]  R. C. Engelhardt, R. Raycheva, M. Lange, L. Wiskott, and W. Konen, “Ökolopoly: Case Study on Large Action Spaces in Reinforcement Learning,” in Lecture Notes in Computer Science, Machine Learning, Optimization, and Data Science (LOD 2023), (Grasmere, United Kingdom, Sep. 22–26, 2023), G. Nicosia, V. Ojha, E. La Malfa, G. La Malfa, P. M. Pardalos, and R. Umeton, Eds., vol. 14506, Cham, Switzerland: Springer Nature, 2024, pp. 109–123, ISBN: 978-3-031-53966-4. DOI: 10.1007/978-3-031-53966- 4_9.

[11]  M. Oedingen, R. C. Engelhardt, R. Denz, M. Hammer, and W. Konen, “ChatGPT Code Detection: Techniques for Uncovering the Source of Code,” AI, vol. 5, no. 3, pp. 1066– 1094, 2024, ISSN: 2673-2688. DOI: 10.3390/ai5030053. Available: https://www. mdpi.com/2673-2688/5/3/53 (visited on Jul. 4, 2024).

Further References

[12]  T. Lesort, N. DĂ­az-RodrĂ­guez, J.-F. Goudou, and D. Filliat, “State representation learning for control: An overview,” Neural Networks, vol. 108, pp. 379–392, 2018, ISSN: 0893- 6080. DOI: 10.1016/j.neunet.2018.07.006.

[13]  N. Botteghi, M. Poel, and C. Brune, “Unsupervised representation learning in deep reinforcement learning: A review,” arXiv, 2024. DOI: 10.48550/arXiv.2208.14226. Available: https://arxiv.org/abs/2208.14226.

[14]  E. Shelhamer, P. Mahmoudieh, M. Argus, and T. Darrell, “Loss is its own reward: Self-supervision for reinforcement learning,” arXiv, 2017. DOI: 10.48550/arXiv.1612.07307. Available: https://arxiv.org/abs/1612.07307.

[15]  T. de Bruin, J. Kober, K. Tuyls, and R. Babuška, “Integrating state representation learning into deep reinforcement learning,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1394–1401, 2018. DOI: 10.1109/LRA.2018.2800101.

[16]  L. Wiskott and T. J. Sejnowski, “Slow feature analysis: Unsupervised learning of invariances,” Neural computation, vol. 14, no. 4, pp. 715–770, 2002. DOI: 10.1162/089976602317318938.

[17]  M. Towers et al., Gymnasium, 2023. DOI: 10.5281/zenodo.8127026. Available: https: //zenodo.org/record/8127025 (visited on Jul. 8, 2023).

[18]  L. S. Shapley, “A value for n-person games,” in Contributions to the Theory of Games (AM-28), Volume II, H. W. Kuhn and A. W. Tucker, Eds., Princeton: Princeton University Press, 1953, pp. 307–318, ISBN: 9781400881970. DOI: 10.1515/9781400881970-018.

[19]  S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Advances in Neural Information Processing Systems 30, I. Guyon et al., Eds., Curran Associates, Inc., 2017, pp. 4765–4774. Available: http://papers.nips.cc/paper/ 7062-a-unified-approach-to-interpreting-model-predictions.pdf.

[20]  E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2012, pp. 5026–5033. DOI: 10.1109/IROS.2012.6386109.

[21]  J. Pearl, Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge University Press, 2009, ISBN: 978-0-511-80316-1. DOI: 10.1017/CBO9780511803161. Available: https://www.cambridge.org/core/product/identifier/9780511803161/type/ book.

[22]  B. Schölkopf et al., “Toward causal representation learning,” Proceedings of the IEEE, vol. 109, no. 5, pp. 612–634, 2021, ISSN: 1558-2256. DOI: 10.1109/JPROC.2021. 3058954.

[23]  Z. Deng, J. Jiang, G. Long, and C. Zhang, “Causal reinforcement learning: A survey,” Transactions on Machine Learning Research, 2023, Survey Certification, ISSN: 2835-8856. Available: https://openreview.net/forum?id=qqnttX9LPo.

[24]  Z. Wu, N. Dvornik, K. Greff, T. Kipf, and A. Garg, “Slotformer: Unsupervised visual dynamics simulation with object-centric models,” in The Eleventh International Conference on Learning Representations, 2023. Available: https://openreview.net/forum? id=TFbwV6I0VLg.