Gato: A Multi-modal, Multi-Task, Multi-embodiment Generalized Agent
A Generalist Agent
We describe a single generalist agent that works as a multi-modal, multi-task, multi-embodiment generalist policy.
The same network with the same weights can play atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, jointtorques, button presses, or other tokens.
In this report we describe the modeland the data, and document the current capabilities of the agent.
Authors
Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, Nando de Freitas