Hierarchical Modular Network for Video Captioning - 42Papers