Learning to Reinforcement Learning with Causal Sequence Models
In-context Reinforcement Learning with Algorithm Distillation
We propose a method for distilling reinforcement learning (rl)algorithms into neural networks by modeling their training histories with a causal sequence model.A dataset of learning histories is generated by a source rl algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context.We demonstrate that the method can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that it learns a more data-efficient rlalgorithm than the one that generated the source data.