A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with Transformer - 42Papers