The faces behind project (RL)³

Beyond Black Boxes:
Interpretable RL through Representations, XAI, and Reasoning

Modern reinforcement learning (RL) agents use neural networks to predict their best next ac- tion given the current state of their environment. Environments often have complex dynamics and their states are represented as high-dimensional raw data such as images. The neural net- works that process these states must therefore model complex mathematical functions, and are notorious for becoming inaccessible black boxes. As a result, agent decisions are uninter- pretable and potentially unpredictable, especially in scenarios that the agent did not experi- ence during training.

In the RL³ project, we have been exploring ways to address these issues and make RL more interpretable, reliable, and thus trustworthy. Our first line of research was to see which data representations could provide a basis for decision-making that’s both expressive for an RL agent and interpretable for a human user. The second was how to use methods from explain- able artificial intelligence (XAI) to make the decisions of modern RL agents understandable. The third and final one was to explore ways of designing inherently understandable agents that are capable of reasoning about their decisions, rather than just using end-to-end neural networks.

In this article we give an overview of our research projects and results in all three areas. We also provide references to our publications.

Content

Foundations of Clarity: Building Interpretable Data Representations for RL

The first step in making good and transparent decisions is to have an accessible basis – good data – on which to decide. In the case of RL agents, a well-chosen, interpretable representation of its current state not only makes it easier for the agent to choose an action, but also makes its choice more transparent and understandable to human users. In the real world, however, sensor and camera data is often high-dimensional and unprocessed. In two projects, we explore methods of representation learning that can improve the quality and interpretability of data representations for RL.

Auxiliary tasks for representation learning

While there are several surveys of representation learning in RL [15, 16], few works empir- ically compare methods [17, 18], and these only consider visual data. However, many RL environments provide non-visual data, such as sensor readings in factory production lines. We addressed this lack of empirical comparison by comparing common auxiliary tasks used for representation learning in RL (these are tasks other than maximising an agent’s reward, e.g. compressing and reconstructing the current state with an autoencoder) on a variety of different non-visual benchmark RL environments [1]. This work won a Best Paper Award at LOD 2023.

Our results show that representations trained on the task of predicting future states, i.e. representations that can simplify the prediction of the temporal evolution of agent and envi- ronment, are the most powerful. In particular, their usefulness increases with the complexity of an environment, and they can make tasks feasible that were otherwise infeasible for state- of-the-art deep RL agents.

Representations for visual navigation

For visual data, we have explored interpretable representations for navigation tasks. The slow feature analysis (SFA) method, developed in our group and inspired by neuroscience research, is able to extract an agent’s position and heading from a first-person camera view [19]; GitHub. On the one hand, position and heading are a more informative and concise representation than pixels, and on the other hand they are also more suitable for modelling temporal evolution, as these (aptly named) slow features change much slower than pixels in a video. We have combined SFA with state-of-the-art deep RL agents and have shown that, under the right circumstances, it outperforms the usual approach of processing an agent’s camera view with convolutional neural networks [2].

Shedding Light: Using XAI to Explain Deep RL Decisions

Once the agent has a good, interpretable basis for making a decision, the next step is to make the decision process itself understandable. In state-of-the-art deep reinforcement learning (DRL), neural networks process the state of the environment and predict the most profitable next action. The XAI research area provides methods to make the complex functions learned by neural networks accessible to users. Decision trees, rule learning, and Shapley values are widely used approaches in XAI. We use them to explain the decisions of DRL agents.

Decision Trees as Surrogate Models

Our approach is based on the idea of translating the RL problem into a supervised learning one. By observing the DRL agent navigate the RL environment while logging its actions and the environment’s state, we obtain a set of structured data. From these data, a decision tree (DT) can be distilled, which reconstructs the DRL agent’s behavior as a surrogate model [3– 5].

We developed an iterative algorithm, which alternatingly leverages the exploration of DTs and exploitation of the DRL agent’s predictive power to obtain gradually refined datasets from which, in turn, increasingly better-performing DTs can be distilled. In [6] we showed for a variety of continuous control challenges from the gymnasium benchmark suite [20], how this method allowed us to build DTs of limited depth, which successfully master the task, often even surpassing the DRL agents’ performance. Given their structure of hierarchic if-then-rules, the DTs are intrinsically traceable by humans and contain orders of magnitude fewer parameters than the DRL networks.

By successfully applying the method to a real-world robotic task, we could demonstrate how the developed technique stands the test of not perfectly controlled conditions and the presence of noise, both inherent to practical applications [7].

Shapley Values for Explainable RL

In addition or as an alternative to surrogate models, we applied a technique from game the- ory to RL. Shapley values [21, 22] offer a mathematically proven way of distributing the out- come of a collaborative game to the individual players. Translated to a DRL agent’s policy, it allows estimating how much each element of the agent’s perception of the environment contributed how much to its decision of the next action.

In our experiments on more complex RL tasks of directed locomotion of simulated robots (MuJoCo benchmark [23]), we could prove the robustness and applicability of Shapley values to RL and attribute a measure of the overall importance of individual state variables for the decision-making process in dynamic environment of multidimensional action spaces [8]. This allows users to develop a better understanding of DRL agents.

Beyond Deep RL: Towards Reasoning Agents

The complex functions learned by neural networks in state-of-the-art reinforcement learning RL agents are inherently difficult to interpret. While XAI methods can provide simplified, interpretable approximations of these functions, such transparent policies often lack the complexity required to perform well in challenging RL tasks.

Humans, on the other hand, can use reasoning to explain and understand even complex decisions such as ”why did I open that drawer” in the context of a larger task. The future of interpretable RL agents will involve similar reasoning systems. These systems may still use neural networks due to the complexity of most tasks. But similar to the human brain, the neural networks will break down decision-making into individual, disentangled reasoning steps [9].

Causal Reasoning

One promising line of reasoning, again inspired by human problem-solving, is causal reason- ing. Most machine learning approaches are concerned with finding correlation rather than causation. An important reason for this is that understanding causality requires the ability to intervene in the data generation process [24, 25], something that’s usually not possible when using existing datasets in traditional machine learning. In RL, on the other hand, an agent naturally has the ability to explore, act and intervene. It can try action A and then try action B in the same situation and observe the different results. This makes RL particularly well suited to causal reasoning. The improvements in robustness, generalisability and plan- ning that causal understanding brings, in turn, make causal reasoning promising for RL (for an overview of causal RL, see [26]).

When working with recent gradient-based (rather than constraint-based) causal discovery methods we found that they do not necessarily always learn the correct causal relationships. Instead, some methods exploit distributional biases that often, but not always, correlate with causal relationships. When investigating this behavior, we identified two data biases: Asymmetries in variable entropy and asymmetries in the variable’s rate of change due to interventions. If the model has to learn one of several competing causal explanations, those biases can lead to one causal explanation being chosen simply because it is easier to learn. We showed how in certain methods, the causality parameter of the model can be susceptible to this effect [10]. Our hope is to eventually use gradient-based causal discovery methods to learn causal models of RL environments.

Physical Reasoning

Besides causal reasoning, physical reasoning – the understanding and use of physical laws – is an important line of research for reasoning RL. Most RL environments simulate real-world physics in some way. To this end, we have written a comprehensive survey of benchmarks and the state of physical reasoning research in AI in general [11].

Beyond this survey, we have developed a denoising diffusion generative model to infer physical trajectories that obey the laws of Newtonian physics [12]. A forward model of these dynamics can be achieved with good old partial differential equations, or alternatively with modern transformer models such as [27]. In contrast, a denoising diffusion model has the untapped potential to take into account conditions (e.g. object positions) at different times simultaneously, which is not possible in a forward model. With the right architecture and the ability to model object interactions, this becomes very powerful: The model is then able to choose or plan actions, given other objects that it cannot control, and given a desired future state. To the best of our knowledge, denoising diffusion has never been applied in this context before.

Summary

In the RL³ project, we have advanced the fields of representation learning for RL, XAI in RL, and reasoning in RL. In all these fields, we have contributed to making RL agents more interpretable, reliable, and trustworthy.

We have shown how meaningful representations, optimised for understanding the dy- namics of an environment, can make agents perform better or even succeed at a previously intractable task [1]. These representations are often not only more powerful, but also more interpretable, for example when extracting agent location and heading from visual obser- vations [2].

We have also shown how decision trees and their decision rules can be used as inter- pretable surrogate models for RL agents based on neural networks [5]. In particular, we have shown how an iterative training procedure makes the supervised technique of decision trees applicable to the interactive and cumulative setup of RL [6, 7]. In addition to decision trees, we show how Shapley values can be used in the context of RL to explain the impor- tance of individual state variables for a given agent decision [8].

Finally, we have worked towards RL agents that can reason about their actions in a human- like fashion. We found that recent scalable gradient-based causal discovery methods have certain shortcomings that can hinder their application to RL [10], yet we consider this a promising area of research. We also wrote a survey on physical reasoning and its bench- marks, another promising area for reasoning in RL [11]. Based on this survey, we have de- veloped a denoising diffusion model for physical reasoning, which can later be applied as novel ways of reasoning in physics-based RL environments [12].

Cooperation

Theory of Neural Systems Group

Prof. Laurenz Wiskott

PhD student: Moritz Lange

Cologne Institute of Computer Science

Prof. Dr. rer. nat. Wolfgang Konen

PhD student: Raphael Engelhardt

Project Publications

[1] M. Lange, N. Krystiniak, R. C. Engelhardt, W. Konen, and L. Wiskott, “Improving Rein- forcement Learning Efficiency with Auxiliary Tasks in Non-visual Environments: A Com- parison,” in Lecture Notes in Computer Science, Machine Learning, Optimization, and Data Science (LOD 2023), (Grasmere, United Kingdom, Sep. 22–26, 2023), G. Nicosia, V. Ojha, E. La Malfa, G. La Malfa, P. M. Pardalos, and R. Umeton, Eds., vol. 14506, Cham, Switzerland: Springer Nature, 2024, pp. 177–191, ISBN: 978-3-031-53966-4. DOI: 10.1007/978-3-031-53966-4_14.

[2] M. Lange, R. C. Engelhardt, W. Konen, and L. Wiskott, “Interpretable Brain-Inspired Representations Improve RL Performance on Visual Navigation Tasks,” in The 38th An- nual AAAI Conference on Artificial Intelligence, Workshop eXplainable AI approaches for Deep Reinforcement Learning, (Vancouver, Canada, Feb. 20–27, 2024), 2024. Ac- cessed: Nov. 19, 2024. Available: https://openreview.net/forum?id=s1oVgaZ3dQ.

[3] R. C. Engelhardt, M. Lange, L. Wiskott, and W. Konen, “Shedding Light into the Black Box of Reinforcement Learning,” in KI2021 44th German Conference on Artificial In- telligence, Workshop Trustworthy AI in the Wild, (Berlin, Germany (virtual), Sep. 27– Oct. 1, 2021), 2021. Accessed: Nov. 19, 2024. Available: https://dataninja.nrw/wp- content/uploads/2021/09/1_Engelhardt_SheddingLight_Abstract.pdf.

[4] R. C. Engelhardt, M. Lange, L. Wiskott, and W. Konen, “Finding the Relevant Samples for Decision Trees in Reinforcement Learning,” in Online Proceedings of the Dataninja Spring School 2023, (Bielefeld, Germany, May 8–10, 2023), 2023. Accessed: Nov. 19, 2024. Available: https://dataninja.nrw/?page_id=1251.

[5] R. C. Engelhardt, M. Lange, L. Wiskott, and W. Konen, “Sample-Based Rule Extraction for Explainable Reinforcement Learning,” in Lecture Notes in Computer Science, Ma- chine Learning, Optimization, and Data Science (LOD 2022), (Certosa di Pontignano, Italy, Sep. 19–22, 2022), G. Nicosia et al., Eds., vol. 13810, Cham, Switzerland: Springer Nature, 2023, pp. 330–345, ISBN: 978-3-031-25599-1. DOI: 10.1007/978-3-031-25599- 1_25.

[6] R. C. Engelhardt, M. Oedingen, M. Lange, L. Wiskott, and W. Konen, “Iterative Oblique Decision Trees Deliver Explainable RL Models,” Algorithms, vol. 16, no. 6, p. 282, 2023, Advancements in Reinforcement Learning Algorithms, ISSN: 1999-4893. DOI: 10.3390/ a16060282. Accessed: Jul. 4, 2024. Available: https://www.mdpi.com/1999-4893/16/ 6/282.

[7] R. C. Engelhardt, M. J. Meinen, M. Lange, L. Wiskott, and W. Konen, “Putting the It- erative Training of Decision Trees to the Test on a Real-World Robotic Task,” arXiv preprint arXiv:2412.04974, Jan. 1, 2024. Accessed: Jan. 1, 2024. Available: https:// arxiv.org/pdf/2412.04974, published.

[8] R. C. Engelhardt, M. Lange, L. Wiskott, and W. Konen, “Exploring the Reliability of SHAP Values in Reinforcement Learning,” in Communications in Computer and Information Science, Explainable Artificial Intelligence (xAI 2024), (Valletta, Malta, Jul. 17–19, 2024), L. Longo, S. Lapuschkin, and C. Seifert, Eds., vol. 2155, Cham, Switzerland: Springer Nature, 2024, pp. 165–184, ISBN: 978-3-031-63800-8. DOI: 10.1007/978-3-031-63800- 8_9.

[9] M. Lange, R. C. Engelhardt, W. Konen, and L. Wiskott, “Beyond Trial and Error in Rein- forcement Learning,” in DataNinja sAIOnARA 2024 Conference, (Bielefeld, Germany, Jun. 25–27, 2024), U. Kuhl, Ed., 2024. DOI: 10 . 11576 / dataninja – 1172. Accessed: Nov. 19, 2024. Available: https://biecoll.ub.uni-bielefeld.de/index.php/ dataninja/issue/view/82.

[10] T. Schwabe, M. Lange, L. Wiskott, and M. Acosta, “Effects of Distributional Biases on Gradient-Based Causal Discovery in the Bivariate Categorical Case,” 2025. arXiv: 2509. 01621 [cs.LG]. Available: https://arxiv.org/abs/2509.01621.

[11] A. Melnik et al., “Benchmarks for Physical Reasoning AI,” Transactions on Machine Learning Research, 2023, ISSN: 2835-8856. Accessed: Nov. 19, 2024. Available: https: //openreview.net/forum?id=cHroS8VIyN.

[12] M. Lange, R. C. Engelhardt, W. Konen, A. Melnik, and L. Wiskott, “Object-centric De- noising Diffusion Models for Physical Reasoning,” 2025. arXiv: 2507.04920 [cs.LG]. Available: https://arxiv.org/abs/2507.04920.

[13] R. C. Engelhardt, R. Raycheva, M. Lange, L. Wiskott, and W. Konen, “Ökolopoly: Case Study on Large Action Spaces in Reinforcement Learning,” in Lecture Notes in Com- puter Science, Machine Learning, Optimization, and Data Science (LOD 2023), (Gras- mere, United Kingdom, Sep. 22–26, 2023), G. Nicosia, V. Ojha, E. La Malfa, G. La Malfa, P. M. Pardalos, and R. Umeton, Eds., vol. 14506, Cham, Switzerland: Springer Nature, 2024, pp. 109–123, ISBN: 978-3-031-53966-4. DOI: 10.1007/978-3-031-53966-4_9.

[14] M. Oedingen, R. C. Engelhardt, R. Denz, M. Hammer, and W. Konen, “ChatGPT Code Detection: Techniques for Uncovering the Source of Code,” AI, vol. 5, no. 3, pp. 1066– 1094, 2024, ISSN: 2673-2688. DOI: 10.3390/ai5030053. Accessed: Jul. 4, 2024. Avail- able: https://www.mdpi.com/2673-2688/5/3/53.

Further References

[15] T. Lesort, N. Díaz-Rodríguez, J.-F. Goudou, and D. Filliat, “State representation learning for control: An overview,” Neural Networks, vol. 108, pp. 379–392, 2018, ISSN: 0893- 6080. DOI: 10.1016/j.neunet.2018.07.006.

[16] N. Botteghi, M. Poel, and C. Brune, “Unsupervised representation learning in deep reinforcement learning: A review,” arXiv, 2024. DOI: 10.48550/arXiv.2208.14226. Available: https://arxiv.org/abs/2208.14226.

[17] E. Shelhamer, P. Mahmoudieh, M. Argus, and T. Darrell, “Loss is its own reward: Self- supervision for reinforcement learning,” arXiv, 2017. DOI: 10.48550/arXiv.1612. 07307. Available: https://arxiv.org/abs/1612.07307.

[18] T. de Bruin, J. Kober, K. Tuyls, and R. Babuška, “Integrating state representation learn- ing into deep reinforcement learning,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1394–1401, 2018. DOI: 10.1109/LRA.2018.2800101.

[19] L. Wiskott and T. J. Sejnowski, “Slow feature analysis: Unsupervised learning of in- variances,” Neural computation, vol. 14, no. 4, pp. 715–770, 2002. DOI: 10 . 1162 / 089976602317318938.

[20] M. Towers et al., Gymnasium, 2023. DOI: 10.5281/zenodo.8127026. Accessed: Jul. 8, 2023. Available: https://zenodo.org/record/8127025.

[21] L. S. Shapley, “A value for n-person games,” in Contributions to the Theory of Games (AM-28), Volume II, H. W. Kuhn and A. W. Tucker, Eds., Princeton: Princeton University Press, 1953, pp. 307–318, ISBN: 9781400881970. DOI: 10.1515/9781400881970-018.

[22] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Advances in Neural Information Processing Systems 30, I. Guyon et al., Eds., Curran Associates, Inc., 2017, pp. 4765–4774. Available: http://papers.nips.cc/paper/ 7062-a-unified-approach-to-interpreting-model-predictions.pdf.

[23] E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2012, pp. 5026–5033. DOI: 10.1109/IROS.2012.6386109.

[24] J. Pearl, Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge University Press, 2009, ISBN: 978-0-511-80316-1. DOI: 10 . 1017 / CBO9780511803161. Available: https://www.cambridge.org/core/product/identifier/9780511803161/type/book.

[25] B. Schölkopf et al., “Toward causal representation learning,” Proceedings of the IEEE, vol. 109, no. 5, pp. 612–634, 2021, ISSN: 1558-2256. DOI: 10.1109/JPROC.2021.3058954.

[26] Z. Deng, J. Jiang, G. Long, and C. Zhang, “Causal reinforcement learning: A survey,” Transactions on Machine Learning Research, 2023, Survey Certification, ISSN: 2835- 8856. Available: https://openreview.net/forum?id=qqnttX9LPo.

[27] Z. Wu, N. Dvornik, K. Greff, T. Kipf, and A. Garg, “Slotformer: Unsupervised visual dy- namics simulation with object-centric models,” in The Eleventh International Confer- ence on Learning Representations, 2023. Available: https://openreview.net/forum? id=TFbwV6I0VLg.

The faces behind project (RL)3

Raphael Engelhardt

Moritz Lange

Beyond Black Boxes:Interpretable RL through Representations, XAI, and Reasoning