Shaking the foundations: delusions in sequence models for interaction and control
The recent phenomenal success of language models has reinvigorated machine
learning research, and large sequence models such as transformers are being
applied to a variety of domains. One important problem class that has remained
relatively elusive however is purposeful adaptive behavior. Currently there is
a common perception that sequence models "lack the understanding of the cause
and effect of their actions" leading them to draw incorrect inferences due to
auto-suggestive delusions. In this report we explain where this mismatch
originates, and show that it can be resolved by treating actions as causal
interventions. Finally, we show that in supervised learning, one can teach a
system to condition or intervene on data by training with factual and
counterfactual error signals respectively.
Authors
Pedro A. Ortega, Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, Jonas Degrave, Bilal Piot, Julien Perolat, Tom Everitt, Corentin Tallec, Emilio Parisotto, Tom Erez, Yutian Chen, Scott Reed, Marcus Hutter, Nando de Freitas, Shane Legg