Algorithms for decision support, optimization and machine learning contain parameters that influence their behavior and the way they solve problems. Adjusting these parameters can lead to faster performance and/or higher quality solutions. However, finding good parameters is a difficult task, especially when the data received as input changes over time. Our project harnesses the learning power of multi-armed bandit (MAB) methods to adjust the parameters of algorithms in realtime over the course of problem solving.
The parameters of an algorithm have an influence on its problem-solving behavior and hence its performance on specific problem instances. For example, depending on how skillful its parameters are set, an algorithm for solving traveling salesperson (TSP) problems may run faster or slower on a concrete TSP instance or may produce better or worse solutions (an approximation to the optimum). Algorithm configuration (AC) refers to the task of automatically adjusting the parameters of an algorithm on a per-instance basis, i.e., tailoring the parameters to a specific problem instance1. We are especially interested in a practically motivated pool-based scenario, where problems of the same type are solved repeatedly, and several algorithms are run on each problem instance in parallel. In this context, parameters ought to be adjusted in real-time.
To tackle the AC problem, we make use of multi-armed bandit algorithms, a class of machine learning methods for sequential decision making under uncertainty2. More concretely, we will develop new types of MAB algorithms specifically tailored to our setting, which are able to choose high-quality parameters even as the underlying distribution of problems drifts over time. To this end, we will mainly build on so-called preselection bandits3, a recent class of bandit algorithms that is well suitable to formalize the pool-based variant of real-time AC4.
As a first step, we wrote a literature review on the current state of algorithm configuration. The review provides an up-to-date overview of AC methods presented in the literature and their applicability for different problem settings. In particular, we provide the first taxonomy that structures methods and problem settings according to their characteristics. In addition, we look at the current state of industry adoption and provide a glimpse into the future of AC by aggregating and discussing unsolved problems and further work directions.
Furthermore, we have developed an algorithmic framework to find an optimal arm in a combinatorial bandit scenario, a generalization of the classical multi-armed bandit problem, in which in each time step an entire set of arms can be “pulled” simultaneously. The algorithm is quite general in its design such that it can be instantiated for a combinatorial bandit scenario, where either numerical reward values are received for each arm contained in the “pulled” set of arms, or feedback is obtained in the form of a qualitative comparison between the arms (preference feedback). From a practical point of view, this algorithm can also be applied in the context of algorithm configuration by interpreting the available configurations as arms that can be executed (i.e. “pulled”) in parallel on our computational resources. Our experiments show that stopping the parallel solution process once a configuration is finished and providing the corresponding preference feedback to the algorithm (i.e., which configuration finished first?) leads to good empirical results in finding suitable configurations.
- Brandt, J., Haddenhorst, B., Bengs, V., and Hüllermeier, E. (2022), Finding optimal arms in non-stochastic combinatorial bandits with semi-bandit feedback and finite budget.
- Schede, E., Brandt, J., Tornede, A., Wever, M., Bengs, V., Hüllermeier, E. and Tierney, K. (2022), A survey of methods for automated algorithm configuration. CoRR, abs/2202.01651.
- 1.C. Ansótegui, Y. Malitsky, Samulowitz H, M. Sellmann, Tierney K. Model-Based Genetic Algorithms for Algorithm Configuration. In: IJCAI. ; 2015.
- 2.Róbert Busa-Fekete, Hüllermeier E, El Mesaoudi-Paul A. Preference-based Online Learning with Dueling Bandits: A Survey. CoRR. 2018;abs/1807.11398. http://arxiv.org/abs/1807.11398
- 3.Bengs V, E. Hüllermeier. Preselection Bandits. In: ICML. ; 2020.
- 4.El Mesaoudi-Paul A, Wei D, Bengs V, Hüllermeier E, Tierney K. Pool-Based Realtime Algorithm Configuration: A Preselection Bandit Approach. In: Kotsireas IS, Pardalos PM, eds. Learning and Intelligent Optimization. Springer International Publishing; 2020:216–232.