A Distribution Matching Strategy for Single-Life Reinforcement Learning
You Only Live Once: Single-Life Reinforcement Learning
We propose an adversarial learning algorithm, adversarial learning (qwale), which leverages the agent s prior experience as guidance in novel situations.
We formalize this problem setting, which we call single-life reinforcement learning (slrl), where an agent must complete a task within a single episode without intervention, utilizing its prior experience while contending with some form of novelty.
We find that algorithms designed for standard episodic reinforcement learning often struggle to recoverfrom out-of-distribution states in this setting.
Motivated by this observation, we propose an algorithm, adversarial learning (qwale), which employs a distribution matching strategy that leverages the agent s prior experience as guidance in novel situations.
Our experiments on several single-life continuous control problems indicate that methods based on our distribution matching formulation are 20-60% more successful because they can more quickly recover from novel states.
Authors
Annie S. Chen, Archit Sharma, Sergey Levine, Chelsea Finn