{"id":331,"date":"2021-04-15T14:29:50","date_gmt":"2021-04-15T14:29:50","guid":{"rendered":"http:\/\/dataninja.nrw\/?page_id=331"},"modified":"2025-02-10T08:15:21","modified_gmt":"2025-02-10T08:15:21","slug":"gaia-gaussian-processes-for-automatic-and-interpretable-anomaly-detection","status":"publish","type":"page","link":"https:\/\/dataninja.nrw\/?page_id=331","title":{"rendered":"GAIA: Gaussian Processes for Automatic and Interpretable Anomaly-detection"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\">Goal<\/h3>\n\n\n\n<p>The GAIA project aims to explore Gaussian processes for efficient detection and interpretation of anomalies in multivariate time series data. In particular, we aim to investigate unsupervised Gaussian processes in order to identify, understand and resolve underlying correlations and anomalies. In order to learn Gaussian process models in a scalable and real-time manner, we intend to develop new streaming algorithms, which will be implemented in an open source manner and with reference to industrial standards, and tested in application-oriented scenarios, together with industry partners.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"724\" src=\"https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/03_GAIA_A3_draft_vs2-1024x724.jpeg\" alt=\"illustration_gaia\" class=\"wp-image-332\" srcset=\"https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/03_GAIA_A3_draft_vs2-1024x724.jpeg 1024w, https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/03_GAIA_A3_draft_vs2-300x212.jpeg 300w, https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/03_GAIA_A3_draft_vs2-768x543.jpeg 768w, https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/03_GAIA_A3_draft_vs2-1536x1086.jpeg 1536w, https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/03_GAIA_A3_draft_vs2-2048x1448.jpeg 2048w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>Illustration: Christoph J Kellner, Studio Animanova<\/em><\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Project Overview<\/h3>\n\n\n\n<p>Gaussian Process Models are widely used in Bayesian Machine Learning as they can be applied when only limited data is available and can be directly interpreted. The GAIA project will employ such Gaussian Process Models in an extension to multivariate time series. The resultant model will therefore cover spatial as well as temporal information and should detect anomalies spanning both of these dimensions. On focus will be on how model selection influences the explainability of constructed models.<\/p>\n\n\n\n<p>A latent variable model can be used, for example, to represent a process as is found in many industrial applications. Such processes are characterized through temporal data given as time series. In the project, a latent variable model should be learned in an unsupervised fashion as a Gaussian Process Model. Importantly, prior knowledge on the particular application domain can be exploited: While usually in Gaussian Processes the search for a fitting covariance function is expensive, domain knowledge can be introduced into the covariance matrix in the form of underlying differential equations which leads to hybrid and interpretable models.<\/p>\n\n\n\n<p>During the first year, we managed to produce first results and a first implementation for physical data driven hybrid Gaussian Processes. Further, we supervised a successful Bachelor&#8217;s thesis on model selection, whose results we will further investigate.<br>Finally, we developed a novel model search algorithm for Gaussian processes on data streams and published a first evaluation.<\/p>\n\n\n\n<p><span id=\"f458a87e-1cc1-416f-a847-1e7f701fde4c\" data-items=\"[&quot;3712268259&quot;,&quot;3783165380&quot;,&quot;48187783&quot;,&quot;3772046351&quot;]\" class=\"abt-citation\" data-has-children=\"true\" contenteditable=\"false\"><sup>\u200b1\u20134\u200b<\/sup><\/span><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"179\" src=\"https:\/\/dataninja.nrw\/wp-content\/uploads\/2022\/06\/PastedGraphic-1-1024x179.png\" alt=\"\" class=\"wp-image-858\" srcset=\"https:\/\/dataninja.nrw\/wp-content\/uploads\/2022\/06\/PastedGraphic-1-1024x179.png 1024w, https:\/\/dataninja.nrw\/wp-content\/uploads\/2022\/06\/PastedGraphic-1-300x53.png 300w, https:\/\/dataninja.nrw\/wp-content\/uploads\/2022\/06\/PastedGraphic-1-768x135.png 768w, https:\/\/dataninja.nrw\/wp-content\/uploads\/2022\/06\/PastedGraphic-1-1536x269.png 1536w, https:\/\/dataninja.nrw\/wp-content\/uploads\/2022\/06\/PastedGraphic-1.png 1575w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>A visualization of a posterior Gaussian process on noisy data from a three tank system. The black stars are the noisy datapoints used to train the GP, with noise as high as 10% of the maximal signal, the red line is the original noise-free data for reference, the blue dashed line is the posterior mean and the transparent blue are the 2 sigma confidence band. We can see that despite such high noise, the posterior mean learns the true underlying behaviour very well.<\/em><\/figcaption><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Project Publications<\/h3>\n\n\n\n<ul>\n<li>Berns, Fabian, Jan David H\u00fcwel, and Christian Beecks (2021). \u2018\u2018LOGIC: Probabilistic Machine Learning for Time Series Classification\u2019\u2019. In:&nbsp;ICDM. IEEE, pp. 1000\u20131005.<\/li>\n\n\n\n<li>Berns, Fabian, Jan David H\u00fcwel, and Christian Beecks (2022). \u2018\u2018Automated Model Inference for Gaussian Processes: An Overview of State-of-the-Art Methods and Algorithms\u2019\u2019. In:&nbsp;SN Comput. Sci.&nbsp;3.4, p. 300.<\/li>\n\n\n\n<li>Besginow, Andreas, Jan David H\u00fcwel, Markus Lange-Hegermann, and Christian Beecks (2021). \u2018\u2018Exploring Methods to Apply Gaussian Processes in Industrial Anomaly Detection\u2019\u2019. In:&nbsp;KI. Vol. 44.<\/li>\n\n\n\n<li>Besginow, Andreas, Jan David H\u00fcwel, Markus Lange-Hegermann, and Christian Beecks (2024). \u2018\u2018Finding commonalities in dynamical systems with gaussian processes\u2019\u2019. In:&nbsp;DataNinja sAIOnARA Conference, pp. 26\u201328.&nbsp;doi:&nbsp;<a href=\"https:\/\/biecoll.ub.uni-bielefeld.de\/index.php\/dataninja\/article\/download\/1162\/1184\">10.11576\/dataninja-1162<\/a>.<\/li>\n\n\n\n<li>Besginow, Andreas, Jan David H\u00fcwel, Thomas Pawellek, Christian Beecks, and Markus Lange-Hegermann (2024). \u2018\u2018On the Laplace Approximation as Model Selection Criterion for Gaussian Processes\u2019\u2019. In:&nbsp;arXiv preprint <a href=\"https:\/\/arxiv.org\/abs\/2403.09215\">arXiv:2403.09215<\/a>.<\/li>\n\n\n\n<li>Besginow, Andreas and Markus Lange-Hegermann (2022). \u2018\u2018Constraining Gaussian Processes to Systems of Linear Ordinary Differential Equations\u2019\u2019. In:&nbsp;Advances in Neural Information Processing Systems. Ed. by Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho.<\/li>\n\n\n\n<li>Gresch, Anne, Jana Osthues, Jan D H\u00fcwel, Jennifer K Briggs, Tim Berger, Ruben Koch, Thomas Deickert, Christian Beecks, Richard KP Benninger, and Martina D\u00fcfer (2024). \u2018\u2018Resolving spatiotemporal electrical signaling within the islet via CMOS microelectrode arrays\u2019\u2019. In:&nbsp;Diabetes, db230870.<\/li>\n\n\n\n<li>H\u00fcwel, Jan David and Christian Beecks (2023). \u2018\u2018Gaussian Process Component Mining with the Apriori Algorithm\u2019\u2019. In:&nbsp;DEXA (2). Vol. 14147. Lecture Notes in Computer Science. Springer, pp. 423\u2013429.<\/li>\n\n\n\n<li>H\u00fcwel, Jan David and Christian Beecks (2024a). \u2018\u2018Discovering Structural Regularities in Time Series via Gaussian Processes\u2019\u2019. In:&nbsp;DSAA. IEEE, pp. 1\u201310.<\/li>\n\n\n\n<li>H\u00fcwel, Jan David and Christian Beecks (2024b). \u2018\u2018Frequent Component Analysis for Large Time Series Databases with Gaussian Processes\u2019\u2019. In:EDBT. OpenProceedings.org, pp. 617\u2013622.<\/li>\n\n\n\n<li>H\u00fcwel, Jan David, Fabian Berns, and Christian Beecks (2021). \u2018\u2018Automated Kernel Search for Gaussian Processes on Data Streams\u2019\u2019. In:&nbsp;IEEE BigData. IEEE, pp. 3584\u20133588.<\/li>\n\n\n\n<li>H\u00fcwel, Jan David, Andreas Besginow, Fabian Berns, Markus Lange-Hegermann, and Christian Beecks (2021). \u2018\u2018On Kernel Search Based Gaussian Process Anomaly Detection\u2019\u2019. In:&nbsp;IN4PL (Revised Selected Papers). Vol. 1855. Communications in Computer and Information Science. Springer, pp. 1\u201323.<\/li>\n\n\n\n<li>H\u00fcwel, Jan David, Anne Gresch, Tim Berger, Martina D\u00fcfer, and Christian Beecks (2022). \u2018\u2018Analysis of Extracellular Potential Recordings by High-Density Micro-electrode Arrays of Pancreatic Islets\u2019\u2019. In:&nbsp;DEXA (2). Vol. 13427. Lecture Notes in Computer Science. Springer, pp. 270\u2013276.<\/li>\n\n\n\n<li>H\u00fcwel, Jan David, Anne Gresch, Fabian Berns, Ruben Koch, Martina D\u00fcfer, and Christian Beecks (2022). \u2018\u2018Tracing Patterns in Electrophysiological Time Series Data\u2019\u2019. In:&nbsp;DSAA. IEEE, pp. 1\u201310.<\/li>\n\n\n\n<li>H\u00fcwel, Jan David, Florian Haselbeck, Dominik G. Grimm, and Christian Beecks (2022). \u2018\u2018Dynamically Self-adjusting Gaussian Processes for Data Stream Modelling\u2019\u2019. In:&nbsp;KI. Vol. 13404. Lecture Notes in Computer Science. Springer, pp. 96\u2013114.<\/li>\n\n\n\n<li>H\u00fcwel, Jan David, Georg Stefan Schlake, Kevin Albrechts, and Christian Beecks (2024a). \u2018\u2018Discovering Propagating Signals in High-Content Multivariate Time Series via Spatio-Temporal Subsequence Clustering (In print)\u2019\u2019. In:&nbsp;Proceedings of the IEEE International Conference on Big Data.<\/li>\n\n\n\n<li>H\u00fcwel, Jan David, Georg Stefan Schlake, Kevin Albrechts, and Christian Beecks (2024b). \u2018\u2018Identifying Propagating Signals with Spatio-Temporal Clustering in Multivariate Time Series\u2019\u2019. In:&nbsp;SISAP. Vol. 15268. Lecture Notes in Computer Science. Springer, pp. 207\u2013214.<\/li>\n\n\n\n<li>Schlake, Georg Stefan, Jan David H\u00fcwel, Fabian Berns, and Christian Beecks (2022). \u2018\u2018Evaluating the Lottery Ticket Hypothesis to Sparsify Neural Networks for Time Series Classification\u2019\u2019. In:&nbsp;ICDE Workshops. IEEE, pp. 70\u201373.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Cooperation<\/h3>\n\n\n\n<div class=\"wp-block-columns\">\n    <div class=\"wp-block-column contrib-container\" style=\"flex-basis:20%\">\n        <a href=\"https:\/\/www.fernuni-hagen.de\/english\/\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/dataninja.nrw\/wp-content\/uploads\/2022\/06\/Logo-der-Fernuni-Hagen.png\" alt=\"Fernuni Hagen\" class=\"wp-image-197\" height=\"100%\"><\/a>\n    <\/div>\n    <div class=\"wp-block-column\" style=\"margin-right:0.5cm\"><\/div>\n    <div class=\"wp-block-column\" style=\"flex-basis:80%\">\n         <a href=\"https:\/\/www.fernuni-hagen.de\/ds\/\"><b><p class=\"contrib-card-label\">Data Management and Analytics Group<\/p><\/b><\/a>\n        <a href=\"https:\/\/www.fernuni-hagen.de\/ds\/team\/christian.beecks.shtml\"><p class=\"contrib-card-label\">Prof. Dr. Christian Beecks<\/p><\/a>\n       <p class=\"contrib-card-label\">PhD student: <a href=\"https:\/\/www.fernuni-hagen.de\/ds\/team\/jan-david.huewel.shtml\">Jan David H\u00fcwel<\/a><\/p>\n    <\/div>\n<\/div>\n<div class=\"wp-block-columns\">\n    <div class=\"wp-block-column contrib-container\" style=\"flex-basis:20%;\">\n        <a href=\"https:\/\/www.th-owl.de\/\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/TH_OWL_DE-EN_sRGB-e1618308605801.png\" alt=\"\" class=\"wp-image-197\"><\/a>\n    <\/div>\n    <div class=\"wp-block-column\" style=\"margin-right:0.5cm\"><\/div>\n    <div class=\"wp-block-column\" style=\"flex-basis:80%\">\n        <a href=\"https:\/\/www.th-owl.de\/eecs\/\"><b><p class=\"contrib-card-label\">Department of Electrical Engineering and Computer Science<\/p><\/b><\/a>\n        <a href=\"https:\/\/www.th-owl.de\/eecs\/fachbereich\/team\/markus-lange-hegermann\/\"><p class=\"contrib-card-label\">Prof. Dr. Markus Lange-Hegermann<\/p><\/a>\n        <p class=\"contrib-card-label\">PhD student: <a href=\"https:\/\/www.init-owl.de\/team\/andreas-besginow\/\">Andreas Besginow<\/a><\/p>\n    <\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">References<\/h3>\n\n\n\n<section aria-label=\"References\" class=\"wp-block-abt-static-bibliography abt-static-bib\" role=\"region\"><ol class=\"abt-bibliography__body\"><\/ol><\/section>\n\n\n\n<section aria-label=\"Bibliography\" class=\"wp-block-abt-bibliography abt-bibliography\" role=\"region\"><ol class=\"abt-bibliography__body\" data-entryspacing=\"1\" data-maxoffset=\"3\" data-linespacing=\"1\" data-second-field-align=\"flush\"><li id=\"3712268259\">  <div class=\"csl-entry\">\n    <div class=\"csl-left-margin\">1. <\/div><div class=\"csl-right-inline\">F. Berns, K. Schmidt, I. Bracht, C. Beecks. 3CS Algorithm for Efficient Gaussian Process Model Retrieval. In: <i>Proceedings 25th of the International Conference on Pattern Recognition 2020<\/i>. IEEE Computer Society; 2021:1773-1780. <a href=\"https:\/\/doi.org\/ https:\/\/www.uni-muenster.de\/forschungaz\/publication\/164365 \"> https:\/\/www.uni-muenster.de\/forschungaz\/publication\/164365 <\/a><\/div>\n  <\/div>\n<\/li><li id=\"3783165380\">  <div class=\"csl-entry\">\n    <div class=\"csl-left-margin\">2. <\/div><div class=\"csl-right-inline\">Lange-Hegermann M. Algorithmic Linearly Constrained Gaussian Processes. In: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett, eds. <i>Advances in Neural Information Processing Systems<\/i>. Vol 31. Curran Associates, Inc.; 2018. <a href=\"https:\/\/proceedings.neurips.cc\/paper\/2018\/file\/68b1fbe7f16e4ae3024973f12f3cb313-Paper.pdf\">https:\/\/proceedings.neurips.cc\/paper\/2018\/file\/68b1fbe7f16e4ae3024973f12f3cb313-Paper.pdf<\/a><\/div>\n  <\/div>\n<\/li><li id=\"48187783\">  <div class=\"csl-entry\">\n    <div class=\"csl-left-margin\">3. <\/div><div class=\"csl-right-inline\">F. Berns, C. Beecks. Complexity-Adaptive Gaussian Process Model Inference for Large-Scale Data. In: <i>Proceedings of the SIAM International Conference on Data Mining (SDM 2021)<\/i>. ; 2021.<\/div>\n  <\/div>\n<\/li><li id=\"3772046351\">  <div class=\"csl-entry\">\n    <div class=\"csl-left-margin\">4. <\/div><div class=\"csl-right-inline\">Beecks C, Willy Schmidt K, Berns F, Gra A. Gaussian Processes for Anomaly Description in Production Environments. In: Papotti P, ed. <i>Proceedings of the Workshops of the EDBT\/ICDT 2019 Joint Conference, EDBT\/ICDT 2019, Lisbon, Portugal, March 26, 2019<\/i>. Vol 2322. CEUR-WS.org; 2019. <a href=\"http:\/\/ceur-ws.org\/Vol-2322\/dsi4-4.pdf\">http:\/\/ceur-ws.org\/Vol-2322\/dsi4-4.pdf<\/a><\/div>\n  <\/div>\n<\/li><\/ol><\/section>\n","protected":false},"excerpt":{"rendered":"<p>Goal The GAIA project aims to explore Gaussian processes for efficient detection and interpretation of anomalies in multivariate time series [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":332,"parent":119,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"ub_ctt_via":"","footnotes":""},"featured_image_src":"https:\/\/dataninja.nrw\/wp-content\/uploads\/2021\/04\/03_GAIA_A3_draft_vs2-scaled.jpeg","_links":{"self":[{"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/pages\/331"}],"collection":[{"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dataninja.nrw\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=331"}],"version-history":[{"count":20,"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/pages\/331\/revisions"}],"predecessor-version":[{"id":2712,"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/pages\/331\/revisions\/2712"}],"up":[{"embeddable":true,"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/pages\/119"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dataninja.nrw\/index.php?rest_route=\/wp\/v2\/media\/332"}],"wp:attachment":[{"href":"https:\/\/dataninja.nrw\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=331"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}