Accessibility-Based Clustering for Efficient Learning of Locomotion Skills
Chong Zhang, Wanming Yu, Zhibin Li
For model-free deep reinforcement learning of quadruped locomotion, the
initialization of robot configurations is crucial for data efficiency and
robustness. This work focuses on algorithmic improvements of data efficiency
and robustness simultaneously through automatic discovery of initial states,
which is achieved by our proposed K-Access algorithm based on accessibility
metrics. Specifically, we formulated accessibility metrics to measure the
difficulty of transitions between two arbitrary states, and proposed a novel
K-Access algorithm for state-space clustering that automatically discovers the
centroids of the static-pose clusters based on the accessibility metrics. By
using the discovered centroidal static poses as the initial states, we can
improve data efficiency by reducing redundant explorations, and enhance the
robustness by more effective explorations from the centroids to sampled poses.
Focusing on fall recovery as a very hard set of locomotion skills, we validated
our method extensively using an 8-DoF quadrupedal robot Bittle. Compared to the
baselines, the learning curve of our method converges much faster, requiring
only 60% of training episodes. With our method, the robot can successfully
recover to standing poses within 3 seconds in 99.4% of the test cases.
Moreover, the method can generalize to other difficult skills successfully,
such as backflipping.