Goal

The GAIA project aims to explore Gaussian processes for efficient detection and interpretation of anomalies in multivariate time series data. In particular, we aim to investigate unsupervised Gaussian processes in order to identify, understand and resolve underlying correlations and anomalies. In order to learn Gaussian process models in a scalable and real-time manner, we intend to develop new streaming algorithms, which will be implemented in an open source manner and with reference to industrial standards, and tested in application-oriented scenarios, together with industry partners.

illustration_gaia
Illustration: Christoph J Kellner, Studio Animanova

Project Overview

Gaussian Process Models are widely used in Bayesian Machine Learning as they can be applied when only limited data is available and can be directly interpreted. The GAIA project will employ such Gaussian Process Models in an extension to multivariate time series. The resultant model will therefore cover spatial as well as temporal information and should detect anomalies spanning both of these dimensions. On focus will be on how model selection influences the explainability of constructed models.

A latent variable model can be used, for example, to represent a process as is found in many industrial applications. Such processes are characterized through temporal data given as time series. In the project, a latent variable model should be learned in an unsupervised fashion as a Gaussian Process Model. Importantly, prior knowledge on the particular application domain can be exploited: While usually in Gaussian Processes the search for a fitting covariance function is expensive, domain knowledge can be introduced into the covariance matrix in the form of underlying differential equations which leads to hybrid and interpretable models.

During the first year, we managed to produce first results and a first implementation for physical data driven hybrid Gaussian Processes. Further, we supervised a successful Bachelor’s thesis on model selection, whose results we will further investigate.
Finally, we developed a novel model search algorithm for Gaussian processes on data streams and published a first evaluation.

​1–4​

A visualization of a posterior Gaussian process on noisy data from a three tank system. The black stars are the noisy datapoints used to train the GP, with noise as high as 10% of the maximal signal, the red line is the original noise-free data for reference, the blue dashed line is the posterior mean and the transparent blue are the 2 sigma confidence band. We can see that despite such high noise, the posterior mean learns the true underlying behaviour very well.

Project Publications

  • Besginow, A., Hüwel, J. D., Lange-Hegermann, M., & Beecks, C. (2020). Exploring Methods to Apply Gaussian Processes in Industrial Anomaly Detection. Neurocomputing, 403, 383-399.
  • Hüwel, J. D., Berns, F., & Beecks, C. (2021, December). Automated Kernel Search for Gaussian Processes on Data Streams. In 2021 IEEE International Conference on Big Data (Big Data) (pp. 3584-3588). IEEE.
  • Berns, F., Hüwel, J. D., & Beecks, C. (2021, December). LOGIC: Probabilistic Machine Learning for Time Series Classification. In 2021 IEEE International Conference on Data Mining (ICDM) (pp. 1000-1005). IEEE.
  • Berns, F., Hüwel, J. & Beecks, C. Automated Model Inference for Gaussian Processes: An Overview of State-of-the-Art Methods and Algorithms. SN COMPUT. SCI. 3, 300 (2022).

Cooperation


References

    1. 1.
      F. Berns, K. Schmidt, I. Bracht, C. Beecks. 3CS Algorithm for Efficient Gaussian Process Model Retrieval. In: Proceedings 25th of the International Conference on Pattern Recognition 2020. IEEE Computer Society; 2021:1773-1780. https://www.uni-muenster.de/forschungaz/publication/164365
    2. 2.
      Lange-Hegermann M. Algorithmic Linearly Constrained Gaussian Processes. In: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett, eds. Advances in Neural Information Processing Systems. Vol 31. Curran Associates, Inc.; 2018. https://proceedings.neurips.cc/paper/2018/file/68b1fbe7f16e4ae3024973f12f3cb313-Paper.pdf
    3. 3.
      F. Berns, C. Beecks. Complexity-Adaptive Gaussian Process Model Inference for Large-Scale Data. In: Proceedings of the SIAM International Conference on Data Mining (SDM 2021). ; 2021.
    4. 4.
      Beecks C, Willy Schmidt K, Berns F, Gra A. Gaussian Processes for Anomaly Description in Production Environments. In: Papotti P, ed. Proceedings of the Workshops of the EDBT/ICDT 2019 Joint Conference, EDBT/ICDT 2019, Lisbon, Portugal, March 26, 2019. Vol 2322. CEUR-WS.org; 2019. http://ceur-ws.org/Vol-2322/dsi4-4.pdf