VIMA: A Transformer-Based Generalized Robot Agent for Multimodal Prompt-Based Manipulation - 42Papers