Learning State Representations via Retracing in Reinforcement Learning
We propose learning via retracing, a novel self-supervised approach for
learning the state representation (and the associated dynamics model) for
reinforcement learning tasks. In addition to the predictive (reconstruction)
supervision in the forward direction, we propose to include `"retraced"
transitions for representation/model learning, by enforcing the
cycle-consistency constraint between the original and retraced states, hence
improve upon the sample efficiency of learning. Moreover, learning via
retracing explicitly propagates information about future transitions backward
for inferring previous states, thus facilitates stronger representation
learning. We introduce Cycle-Consistency World Model (CCWM), a concrete
instantiation of learning via retracing implemented under existing model-based
reinforcement learning framework. Additionally we propose a novel adaptive
"truncation" mechanism for counteracting the negative impacts brought by the
"irreversible" transitions such that learning via retracing can be maximally
effective. Through extensive empirical studies on continuous control
benchmarks, we demonstrates that CCWM achieves state-of-the-art performance in
terms of sample efficiency and asymptotic performance.
Authors
Changmin Yu, Dong Li, Jianye Hao, Jun Wang, Neil Burgess