{"id":313,"date":"2021-04-15T14:29:50","date_gmt":"2021-04-15T14:29:50","guid":{"rendered":"http:\/\/dataninja.nrw\/?page_id=313"},"modified":"2025-02-10T09:05:24","modified_gmt":"2025-02-10T09:05:24","slug":"rl%c2%b3-representation-reinforcement-and-rule-learning-transparent-decision-making-through-explainable-ai-models","status":"publish","type":"page","link":"https:\/\/dataninja.nrw\/?page_id=313","title":{"rendered":"(RL)\u00b3 Representation, Reinforcement and Rule Learning \u2013 Transparent decision-making through explainable AI models"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\">Goal<\/h3>\n\n\n\n<p>While reinforcement learning methods are steadily gaining in popularity and relevance, complex models often lead to non-transparent behavior and their usefulness is tied to narrowly defined tasks. The (RL)<sup>3<\/sup> project aims to improve both interpretability and transferability in reinforcement learning by using understandable data representations as well as rule-based simplifications of neural networks.<\/p>\n\n\n\n<p>Moritz Lange and Prof. Laurenz Wiskott focus on finding data representations to improve the training of reinforcement learning agents and the interpretability of their decisions. The work of Raphael Engelhardt and Prof. Wolfgang Konen aims at developing automated ways to extract human-understandable rules from trained agents.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"724\" src=\"https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/02_RL3_A3_vs3-1024x724.jpg\" alt=\"rl3_illustration\" class=\"wp-image-314\" srcset=\"https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/02_RL3_A3_vs3-1024x724.jpg 1024w, https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/02_RL3_A3_vs3-300x212.jpg 300w, https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/02_RL3_A3_vs3-768x543.jpg 768w, https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/02_RL3_A3_vs3-1536x1086.jpg 1536w, https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/02_RL3_A3_vs3-2048x1448.jpg 2048w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Caption: In (RL)<sup>3<\/sup> representation, reinforcement and rule learning should complement each other. While representation learning allows for a low dimensional input space that facilitates reinforcement learning, rule-based learning will allow insight into the uncovered causalities and operation of the trained system. Illustration: Christoph J Kellner, Studio Animanova<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Project Overview<\/h3>\n\n\n\n<p>Reinforcement learning is an approach to AI in which an agent learns to dynamically interact with its environment to achieve a certain goal. These agents, which are becoming impressively successful across applications from playing games to controlling industrial processes, are commonly based on deep neural networks. However, despite their usefulness, neural networks are notoriously seen as black-box models: their complexity makes them hard to understand and to reason about their decisions. Additionally, the resulting complex and highly specific reinforcement learning algorithms end up being heavily tailored towards specific tasks. (RL)<sup>3<\/sup> will improve the interpretability and transferability of those algorithms to achieve more understandable, more predictable reinforcement learning approaches that are ultimately more secure and easier to apply.<\/p>\n\n\n\n<p>Creating understandable data representations and formulating decision processes as simple rules are among the most efficient approaches to achieve interpretability. Our representation learning research will focus on unsupervised techniques that can learn interpretable data representations independent of specific tasks. Our rule-learning research will simultaneously investigate methods for transforming complex, high-performing reinforcement learning models into simple rules which can be interpreted and modified by domain experts. The developed approaches will be tested on games and in industrial applications.<\/p>\n\n\n\n<p><span contenteditable=\"false\" data-has-children=\"true\" id=\"f636024c-7b21-4e32-891a-3d45b371d1b4\" data-items=\"[&quot;1681644652&quot;,&quot;2519033687&quot;,&quot;3280447298&quot;,&quot;3053845399&quot;]\" class=\"abt-citation\"><sup>\u200b1\u20134\u200b<\/sup><\/span><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Preliminary Results<\/h3>\n\n\n\n<p>We developed a first approach of rule learning based on observing a trained reinforcement learning agent interacting with its environment. From the recorded data, containing the environment\u2019s state and the corresponding action of the agent, we induce decision trees. For simple benchmark problems we could show that human-readable decision trees of very limited complexity perform equally well as the black-box deep reinforcement agents they are based on.<br>Our results have been published as an extended abstract and presented during the poster session of the workshop \u201eTrustworthy AI in the Wild\u201c at KI 2021 \u2013 44th German Conference on Artificial Intelligence<\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Project Publications<\/h3>\n\n\n\n<ul>\n<li>Engelhardt, Raphael C., Moritz Lange, Laurenz Wiskott, and Wolfgang Konen (2021). \u2018\u2018Shedding Light into the Black Box of Reinforcement Learning\u2019\u2019. In:\u00a0Workshop \u2018\u2018Trustworthy AI in the Wild\u2019\u2019 at KI2021 &#8212; 44th\u00a0German Conference on Artificial Intelligence\u00a0(Berlin, Germany (virtual), Sept. 27\u2013Oct. 1, 2021).\u00a0url:\u00a0https:\/\/dataninja.nrw\/wp-content\/ uploads\/2021\/09\/1_Engelhardt_SheddingLight_Abstract.pdf.<\/li>\n\n\n\n<li>Engelhardt, Raphael C., Moritz Lange, Laurenz Wiskott, and Wolfgang Konen (2023). \u2018\u2018Sample-Based Rule Extraction for Explainable Reinforcement Learning\u2019\u2019. In:&nbsp;Lecture Notes in Computer Science. Vol. 13810:&nbsp;Machine Learning, Optimization, and Data Science (LOD 2022)&nbsp;(Certosa di Pontignano, Italy, Sept. 19\u201322, 2022). Ed. by Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Gabriele La Malfa, Panos Pardalos, Giuseppe Di Fatta, Giovanni Giuffrida, and Renato Umeton. Cham, Switzerland: Springer Nature, pp. 330\u2013345.&nbsp;isbn: 978-3-031-25599-1.&nbsp;doi:&nbsp;10.1007\/978-3-031-25599-1_25.<\/li>\n\n\n\n<li>Engelhardt, Raphael C., Moritz Lange, Laurenz Wiskott, and Wolfgang Konen (2024). \u2018\u2018Exploring the Reliability of SHAP Values in Reinforcement Learning\u2019\u2019. In:&nbsp;Communications in Computer and Information Science. Vol. 2155:&nbsp;Explainable Artificial Intelligence (xAI 2024)&nbsp;(Valletta, Malta, July 17\u201319, 2024). Ed. by Luca Longo, Sebastian Lapuschkin, and Christin Seifert. Cham, Switzerland: Springer Nature, pp. 165\u2013184.&nbsp;isbn: 978-3-031-63800-8.&nbsp;doi:&nbsp;10.1007\/978-3-031-63800-8_9.<\/li>\n\n\n\n<li>Engelhardt, Raphael C., Moritz Lange, Laurenz Wiskott, and Wolfgang&#8221; Konen (2023). \u2018\u2018Finding the Relevant Samples for Decision Trees in Reinforcement Learning\u2019\u2019. In:&nbsp;Online Proceedings of the Dataninja Spring School 2023&nbsp;(Bielefeld, Germany, May 8\u201310, 2023).&nbsp;url: https:\/\/dataninja.nrw\/?page_id=1251.<\/li>\n\n\n\n<li>Engelhardt, Raphael C., Marc Oedingen, Moritz Lange, Laurenz Wiskott, and Wolfgang Konen (2023). \u2018\u2018Iterative Oblique Decision Trees Deliver Explainable RL Models\u2019\u2019. In:\u00a0Algorithms\u00a016.6:\u00a0Advancements in Reinforcement Learning Algorithms, p. 282.\u00a0issn: 1999-4893.\u00a0doi:\u00a010.3390\/a16060282.\u00a0url:\u00a0https:\/\/www.mdpi.com\/1999-4893\/16\/6\/282.<\/li>\n\n\n\n<li>Engelhardt, Raphael C., Ralitsa Raycheva, Moritz Lange, Laurenz Wiskott, and Wolfgang Konen (2024). \u2018\u2018\u00d6kolopoly: Case Study on Large Action Spaces in Reinforcement Learning\u2019\u2019. In:&nbsp;Lecture Notes in Computer Science. Vol. 14506:&nbsp;Machine Learning, Optimization, and Data Science (LOD 2023) (Grasmere, United Kingdom, Sept. 22\u201326, 2023). Ed. by Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Gabriele La Malfa, Panos M. Pardalos, and Renato Umeton. Cham, Switzerland: Springer Nature, pp. 109\u2013123. isbn: 978-3-031-53966-4.&nbsp;doi:&nbsp;10.1007\/978-3-031-53966-4_9.<\/li>\n\n\n\n<li>Lange, Moritz, Raphael C Engelhardt, Wolfgang Konen, and Laurenz Wiskott (2024a). \u2018\u2018Interpretable Brain-Inspired Representations Improve RL Performance on Visual Navigation Tasks\u2019\u2019. In:&nbsp;Workshop \u2018\u2018eXplainable AI approaches for Deep Reinforcement Learning\u2019\u2019 at The 38th Annual AAAI Conference on Artificial Intelligence&nbsp;(Vancouver, Canada, Feb. 20\u201327, 2024).&nbsp;url:&nbsp;https:\/\/openreview.net\/forum?id=s1oVgaZ3dQ.<\/li>\n\n\n\n<li>Lange, Moritz, Raphael C. Engelhardt, Wolfgang Konen, and Laurenz Wiskott (2024b). \u2018\u2018Beyond Trial and Error in Reinforcement Learning\u2019\u2019. In:DataNinja sAIOnARA 2024 Conference&nbsp;(Bielefeld, Germany, June 25\u201327, 2024). Ed. by Ulrike Kuhl, pp. 58\u201361.&nbsp;doi:&nbsp;10.11576\/dataninja-1172. url:&nbsp;https:\/\/biecoll.ub.uni-bielefeld.de\/index.php\/dataninja\/issue\/view\/82.<\/li>\n\n\n\n<li>Lange, Moritz, Noah Krystiniak, Raphael C. Engelhardt, Wolfgang Konen, and Laurenz Wiskott (2024). \u2018\u2018Improving Reinforcement Learning Efficiency with Auxiliary Tasks in Non-visual Environments: A Comparison\u2019\u2019. In:Lecture Notes in Computer Science. Vol. 14506:&nbsp;Machine Learning, Optimization, and Data Science (LOD 2023)&nbsp;(Grasmere, United Kingdom, Sept. 22\u201326, 2023). Ed. by Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Gabriele La Malfa, Panos M. Pardalos, and Renato Umeton. Cham, Switzerland: Springer Nature, pp. 177\u2013191.&nbsp;isbn: 978-3-031-53966-4.&nbsp;doi:&nbsp;10.1007\/978-3-031-53966-4_14.<\/li>\n\n\n\n<li>Melnik, Andrew, Robin Schiewer, Moritz Lange, Andrei Ioan Muresanu, mozhgan saeidi, Animesh Garg, and Helge Ritter (2023). \u2018\u2018Benchmarks for Physical Reasoning AI\u2019\u2019. In:&nbsp;Transactions on Machine Learning Research. Survey Certification.&nbsp;issn: 2835-8856.&nbsp;url:&nbsp;https:\/\/openreview.net\/forum?id=cHroS8VIyN.<\/li>\n\n\n\n<li>Oedingen, Marc, Raphael C. Engelhardt, Robin Denz, Maximilian Hammer, and Wolfgang Konen (2024). \u2018\u2018ChatGPT Code Detection: Techniques for Uncovering the Source of Code\u2019\u2019. In:&nbsp;AI&nbsp;5.3, pp. 1066\u20131094.&nbsp;issn: 2673-2688.&nbsp;doi:&nbsp;10.3390\/ai5030053.&nbsp;url:&nbsp;https:\/\/www.mdpi.com\/2673-2688\/5\/3\/53.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Cooperation<\/h3>\n\n\n\n<div class=\"wp-block-columns\">\n    \n<div class=\"wp-block-column contrib-container\" style=\"flex-basis:20%;\">\n<a href=\"https:\/\/www.ruhr-uni-bochum.de\/en\"><span class=\"contrib-container-spacer\"><\/span>\n<img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/ruhr-universitaet-bochum-rub-vector-logo_crop.png\" alt=\"\" class=\"wp-image-197\"><\/a>\n        \n    <\/div>\n    <div class=\"wp-block-column\" style=\"margin-right:0.5cm\"><\/div>\n    <div class=\"wp-block-column\" style=\"flex-basis:80%\">\n        <a href=\"https:\/\/www.ini.rub.de\/research\/groups\/theory_of_neural_systems\/\"><b><p class=\"contrib-card-label\">Theory of Neural Systems Group<\/p><\/b><\/a>\n        <a href=\"https:\/\/www.ini.rub.de\/the_institute\/people\/laurenz-wiskott\/\"><p class=\"contrib-card-label\">Prof. Laurenz Wiskott<\/p><\/a>\n        <p class=\"contrib-card-label\">PhD student: <a href=\"https:\/\/www.ini.rub.de\/the_institute\/people\/moritz-lange\/\">Moritz Lange<\/a><\/p>\n    <\/div>\n<\/div>\n<div class=\"wp-block-columns\">\n    <div class=\"wp-block-column contrib-container\" style=\"flex-basis:20%\">\n        <a href=\"https:\/\/www.th-koeln.de\/\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/TH_Koeln_Logo.svg_.png\" alt=\"\" class=\"wp-image-197\" height=\"100%\"><\/a>\n    <\/div>\n    <div class=\"wp-block-column\" style=\"margin-right:0.5cm\"><\/div>\n    <div class=\"wp-block-column\" style=\"flex-basis:80%\">\n        <a href=\"https:\/\/www.th-koeln.de\/informatik-und-ingenieurwissenschaften\/institut-fuer-informatik_29538.php\"><b><p class=\"contrib-card-label\">Cologne Institute of Computer Science<\/p><\/b><\/a>\n        <a href=\"https:\/\/blogs.gm.fh-koeln.de\/konen\/en\/\"><p class=\"contrib-card-label\">Prof. Dr. rer. nat. Wolfgang Konen<\/p><\/a>\n        <p class=\"contrib-card-label\">PhD student: <a href=\"https:\/\/www.th-koeln.de\/personen\/raphael.engelhardt\/\">Raphael Engelhardt<\/a><\/p>\n    <\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">References<\/h3>\n\n\n\n<p><\/p>\n\n\n\n<section aria-label=\"References\" class=\"wp-block-abt-static-bibliography abt-static-bib\" role=\"region\"><ol class=\"abt-bibliography__body\"><\/ol><\/section>\n\n\n\n<section aria-label=\"Bibliography\" class=\"wp-block-abt-bibliography abt-bibliography\" role=\"region\"><ol class=\"abt-bibliography__body\" data-entryspacing=\"1\" data-maxoffset=\"3\" data-linespacing=\"1\" data-second-field-align=\"flush\"><li id=\"1681644652\">  <div class=\"csl-entry\">\n    <div class=\"csl-left-margin\">1. <\/div><div class=\"csl-right-inline\">N. Escalante A, Wiskott L. Improved graph-based SFA: information preservation complements the slowness principle. <i>Machine Learning<\/i>. 2019;109:999-1037.<\/div>\n  <\/div>\n<\/li><li id=\"2519033687\">  <div class=\"csl-entry\">\n    <div class=\"csl-left-margin\">2. <\/div><div class=\"csl-right-inline\">S. Bagheri, M. Thill, P. Koch, W. Konen. Online Adaptable Learning Rates for the Game Connect-4. <i>IEEE Transactions on Computational Intelligence and AI in Games<\/i>. 2016;8:33-42. doi:<a href=\"https:\/\/doi.org\/10.1109\/TCIAIG.2014.2367105\">10.1109\/TCIAIG.2014.2367105<\/a><\/div>\n  <\/div>\n<\/li><li id=\"3280447298\">  <div class=\"csl-entry\">\n    <div class=\"csl-left-margin\">3. <\/div><div class=\"csl-right-inline\">Konen W, Bagheri S. Reinforcement Learning for N-Player Games: The Importance of Final Adaptation. In: Vasile M, Filipic B, eds. <i>9th International Conference on Bioinspired Optimisation Methods and Their Applications (BIOMA) <\/i>. ; 2020. <a href=\"http:\/\/www.gm.fh-koeln.de\/ciopwebpub\/Konen20b.d\/bioma20-TDNTuple.pdf\">http:\/\/www.gm.fh-koeln.de\/ciopwebpub\/Konen20b.d\/bioma20-TDNTuple.pdf<\/a><\/div>\n  <\/div>\n<\/li><li id=\"3053845399\">  <div class=\"csl-entry\">\n    <div class=\"csl-left-margin\">4. <\/div><div class=\"csl-right-inline\">Legenstein, Robert AND Wilbert, Niko AND Wiskott, Laurenz. Reinforcement Learning on Slow Features of High-Dimensional Input Streams. <i>PLOS Computational Biology<\/i>. 2010;6:1-13. doi:<a href=\"https:\/\/doi.org\/10.1371\/journal.pcbi.1000894\">10.1371\/journal.pcbi.1000894<\/a><\/div>\n  <\/div>\n<\/li><\/ol><\/section>\n","protected":false},"excerpt":{"rendered":"<p>Goal While reinforcement learning methods are steadily gaining in popularity and relevance, complex models often lead to non-transparent behavior and [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":314,"parent":119,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"ub_ctt_via":"","footnotes":""},"featured_image_src":"https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/02_RL3_A3_vs3-scaled.jpg","_links":{"self":[{"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/pages\/313"}],"collection":[{"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dataninja.nrw\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=313"}],"version-history":[{"count":24,"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/pages\/313\/revisions"}],"predecessor-version":[{"id":2733,"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/pages\/313\/revisions\/2733"}],"up":[{"embeddable":true,"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/pages\/119"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/media\/314"}],"wp:attachment":[{"href":"https:\/\/dataninja.nrw\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=313"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}