Abstract Value Iteration for Hierarchical Reinforcement Learning
Kishor Jothimurugan, Osbert Bastani, Rajeev Alur
We propose a novel hierarchical reinforcement learning framework for control
with continuous state and action spaces. In our framework, the user specifies
subgoal regions which are subsets of states; then, we (i) learn options that
serve as transitions between these subgoal regions, and (ii) construct a
high-level plan in the resulting abstract decision process (ADP). A key
challenge is that the ADP may not be Markov, which we address by proposing two
algorithms for planning in the ADP. Our first algorithm is conservative,
allowing us to prove theoretical guarantees on its performance, which help
inform the design of subgoal regions. Our second algorithm is a practical one
that interweaves planning at the abstract level and learning at the concrete
level. In our experiments, we demonstrate that our approach outperforms
state-of-the-art hierarchical reinforcement learning algorithms on several
challenging benchmarks.