Learning Structured Procedures from Cooking Videos
A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos
We propose a benchmark of structured procedural knowledge extracted from cooking videos.
This task is complementary to existing tasks, but requires models to produce interpretable structured knowledge in the form of verb-argument tuples.
Our analysis shows that the proposed task is challenging and standard modeling approaches like unsupervised segmentation, semantic role labeling, and visual actiondetection perform poorly when forced to predict every action of a procedure in a structured form.
Frank F. Xu, Lei Ji, Botian Shi, Junyi Du, Graham Neubig, Yonatan Bisk, Nan Duan