Simple Open-Vocabulary Object Detection with Vision Transformers - 42Papers