On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
Model-based Reinforcement Learning (MBRL) is a promising framework for
learning control in a data-efficient manner. MBRL algorithms can be fairly
complex due to the separate dynamics modeling and the subsequent planning
algorithm, and as a result, they often possess tens of hyperparameters and
architectural choices. For this reason, MBRL typically requires significant
human expertise before it can be applied to new problems and domains. To
alleviate this problem, we propose to use automatic hyperparameter optimization
(HPO). We demonstrate that this problem can be tackled effectively with
automated HPO, which we demonstrate to yield significantly improved performance
compared to human experts. In addition, we show that tuning of several MBRL
hyperparameters dynamically, i.e. during the training itself, further improves
the performance compared to using static hyperparameters which are kept fixed
for the whole training. Finally, our experiments provide valuable insights into
the effects of several hyperparameters, such as plan horizon or learning rate
and their influence on the stability of training and resulting rewards.
Authors
Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra