Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
Offline reinforcement learning (RL) algorithms have shown promising results
in domains where abundant pre-collected data is available. However, prior
methods focus on solving individual problems from scratch with an offline
dataset without considering how an offline RL agent can acquire multiple
skills. We argue that a natural use case of offline RL is in settings where we
can pool large amounts of data collected in various scenarios for solving
different tasks, and utilize all of this data to learn behaviors for all the
tasks more effectively rather than training each one in isolation. However,
sharing data across all tasks in multi-task offline RL performs surprisingly
poorly in practice. Thorough empirical analysis, we find that sharing data can
actually exacerbate the distributional shift between the learned policy and the
dataset, which in turn can lead to divergence of the learned policy and poor
performance. To address this challenge, we develop a simple technique for
data-sharing in multi-task offline RL that routes data based on the improvement
over the task-specific data. We call this approach conservative data sharing
(CDS), and it can be applied with multiple single-task offline RL methods. On a
range of challenging multi-task locomotion, navigation, and vision-based
robotic manipulation problems, CDS achieves the best or comparable performance
compared to prior offline multi-task RL methods and previous data sharing
approaches.
Authors
Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Sergey Levine, Chelsea Finn