Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
Can world knowledge learned by large language models (LLMs) be used to act in
interactive environments? In this paper, we investigate the possibility of
grounding high-level tasks, expressed in natural language (e.g. "make
breakfast"), to a chosen set of actionable steps (e.g. "open fridge"). While
prior work focused on learning from explicit step-by-step examples of how to
act, we surprisingly find that if pre-trained LMs are large enough and prompted
appropriately, they can effectively decompose high-level tasks into low-level
plans without any further training. However, the plans produced naively by LLMs
often cannot map precisely to admissible actions. We propose a procedure that
conditions on existing demonstrations and semantically translates the plans to
admissible actions. Our evaluation in the recent VirtualHome environment shows
that the resulting method substantially improves executability over the LLM
baseline. The conducted human evaluation reveals a trade-off between
executability and correctness but shows a promising sign towards extracting
actionable knowledge from language models. Website at
this https URL
Authors
Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch