MA-Trace: An On-Policy Actor-Critic Algorithm for Multi-Agent Reinforcement Learning
Off-Policy Correction For Multi-Agent Reinforcement Learning
We propose a new on-policy actor-critic algorithm for multi-agent reinforcement learning (marl).
Our algorithm, named ma-trace, is a generalization of the popular actor-critic algorithm (v-trace) to the multi-agent setting.
It utilizes importance sampling as an off-policy correction method, which allows distributing the computations with no impact on the quality of training.
We evaluate the algorithm extensively on the starcraft multi-agent challenge, a standard benchmark for multi-agentalgorithms, and show that it achieves high performance on all its tasks and exceeds state-of-the-art results on some of them.
Authors
Michał Zawalski, Błażej Osiński, Henryk Michalewski, Piotr Miłoś