We propose a new class of deep reinforcement learning (RL) algorithms that
model latent representations in hyperbolic space. Sequential decision-making
requires reasoning about the possible future consequences of current behavior.
Consequently, capturing the relationship between key evolving features for a
given task is conducive to recovering effective policies. To this end,
hyperbolic geometry provides deep RL models with a natural basis to precisely
encode this inherently hierarchical information. However, applying existing
methodologies from the hyperbolic deep learning literature leads to fatal
optimization instabilities due to the non-stationarity and variance
characterizing RL gradient estimators. Hence, we design a new general method
that counteracts such optimization challenges and enables stable end-to-end
learning with deep hyperbolic representations. We empirically validate our
framework by applying it to popular on-policy and off-policy RL algorithms on
the Procgen and Atari 100K benchmarks, attaining near universal performance and
generalization benefits. Given its natural fit, we hope future RL research will
consider hyperbolic representations as a standard tool.
Authors
Edoardo Cetin, Benjamin Chamberlain, Michael Bronstein, Jonathan J Hunt