An Empirical Study of the Most Important Factors in Video-and-Language Model Design - 42Papers