Heuristic Coordination in Cooperative Multi-Agent Reinforcement Learning

Authors Ramon Petri, Eugenio Bargiacchi, Huib Aldewereld, Diederik M. Roijers
Published in Proceedings van de 33rd Benelux Conference on Artificial Intelligence en 30th Belgian Dutch Conference on Machine Learning (BNAIC/BENELEARN 2021)
Publication date 10 November 2021
Research groups Artificial Intelligence
Type Lecture

Summary

Key to reinforcement learning in multi-agent systems is the ability to exploit the fact that agents only directly influence only a small subset of the other agents. Such loose couplings are often modelled using a graphical model: a coordination graph. Finding an (approximately) optimal joint action for a given coordination graph is therefore a central subroutine in cooperative multi-agent reinforcement learning (MARL). Much research in MARL focuses on how to gradually update the parameters of the coordination graph, whilst leaving the solving of the coordination graph up to a known typically exact and generic subroutine. However, exact methods { e.g., Variable Elimination { do not scale well, and generic methods do not exploit the MARL setting of gradually updating a coordination graph and recomputing the joint action to select. In this paper, we examine what happens if we use a heuristic method, i.e., local search, to select joint actions in MARL, and whether we can use outcome of this local search from a previous time-step to speed up and improve local search. We show empirically that by using local search, we can scale up to many agents and complex coordination graphs, and that by reusing joint actions from the previous time-step to initialise local search, we can both improve the quality of the joint actions found and the speed with which these joint actions are found.

On this publication contributed

Language English
Published in Proceedings van de 33rd Benelux Conference on Artificial Intelligence en 30th Belgian Dutch Conference on Machine Learning (BNAIC/BENELEARN 2021)
Key words Coordination Graphs, Local Search, Multi-agent Reinforcement Learning, Multi-agent Thompson Sampling

Artificial Intelligence