Offline reinforcement learning (RL) provides a framework for learning
decision-making from offline data and therefore constitutes a promising
approach for real-world applications as automated driving. Self-driving
vehicles (SDV) learn a policy, which potentially even outperforms the behavior
in the sub-optimal data set. Especially in safety-critical applications as
automated driving, explainability and transferability are key to success. This
motivates the use of model-based offline RL approaches, which leverage
planning. However, current state-of-the-art methods often neglect the influence
of aleatoric uncertainty arising from the stochastic behavior of multi-agent
systems. This work proposes a novel approach for Uncertainty-aware Model-Based
Offline REinforcement Learning Leveraging plAnning (UMBRELLA), which solves the
prediction, planning, and control problem of the SDV jointly in an
interpretable learning-based fashion. A trained action-conditioned stochastic
dynamics model captures distinctively different future evolutions of the
traffic scene. The analysis provides empirical evidence for the effectiveness
of our approach in challenging automated driving simulations and based on a
real-world public dataset.
Authors
Christopher Diehl, Timo Sievernich, Martin Krüger, Frank Hoffmann, Torsten Bertram