Algorithms for decision support, optimization and machine learning contain parameters that influence their behavior and the way they solve problems. Adjusting these parameters can lead to faster performance and/or higher quality solutions. However, finding good parameters is a difficult task, especially when the data received as input changes over time. Our project harnesses the learning power of multi-armed bandit (MAB) methods to adjust the parameters of algorithms in realtime over the course of problem solving.
The parameters of an algorithm have an influence on its problem-solving behavior and hence its performance on specific problem instances. For example, depending on how skillful its parameters are set, an algorithm for solving traveling salesperson (TSP) problems may run faster or slower on a concrete TSP instance or may produce better or worse solutions (an approximation to the optimum). Algorithm configuration (AC) refers to the task of automatically adjusting the parameters of an algorithm on a per-instance basis, i.e., tailoring the parameters to a specific problem instance1. We are especially interested in a practically motivated pool-based scenario, where problems of the same type are solved repeatedly, and several algorithms are run on each problem instance in parallel. In this context, parameters ought to be adjusted in real-time.
To tackle the AC problem, we make use of multi-armed bandit algorithms, a class of machine learning methods for sequential decision making under uncertainty2. More concretely, we will develop new types of MAB algorithms specifically tailored to our setting, which are able to choose high-quality parameters even as the underlying distribution of problems drifts over time. To this end, we will mainly build on so-called preselection bandits3, a recent class of bandit algorithms that is well suitable to formalize the pool-based variant of real-time AC4.
- 1.C. Ansótegui, Y. Malitsky, Samulowitz H, M. Sellmann, Tierney K. Model-Based Genetic Algorithms for Algorithm Configuration. In: IJCAI. ; 2015.
- 2.Róbert Busa-Fekete, Hüllermeier E, El Mesaoudi-Paul A. Preference-based Online Learning with Dueling Bandits: A Survey. CoRR. 2018;abs/1807.11398. http://arxiv.org/abs/1807.11398
- 3.Bengs V, E. Hüllermeier. Preselection Bandits. In: ICML. ; 2020.
- 4.El Mesaoudi-Paul A, Wei D, Bengs V, Hüllermeier E, Tierney K. Pool-Based Realtime Algorithm Configuration: A Preselection Bandit Approach. In: Kotsireas IS, Pardalos PM, eds. Learning and Intelligent Optimization. Springer International Publishing; 2020:216–232.