Causal models of agents have been used to analyse the safety aspects of
machine learning systems. But identifying agents is non-trivial -- often the
causal model is just assumed by the modeler without much justification -- and
modelling failures can lead to mistakes in the safety analysis. This paper
proposes the first formal causal definition of agents -- roughly that agents
are systems that would adapt their policy if their actions influenced the world
in a different way. From this we derive the first causal discovery algorithm
for discovering agents from empirical data, and give algorithms for translating
between causal models and game-theoretic influence diagrams. We demonstrate our
approach by resolving some previous confusions caused by incorrect causal
modelling of agents.
Authors
Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, Tom Everitt