We present a combined scaling method called basic that achieves 85.7% top-1 zero-shot accuracy on the imagenet image-text validation set, surpassing the best-published zero-shot models-clip and align-by 9.3%.
Our model also shows significant improvements in robustness benchmarks such as image-text robustness and image-text robustness.
Image-text robustness is one of the most important tasks in image processing.
Image-text robustness is one of the most important tasks in image processing.
Image-text robustness is one of the most important tasks in image processing.
Image-text robustness is one of the most important tasks in image processing.
Image-text robustness is one of the most important tasks in image processing.
Image-text robustness is one of the most important tasks in image processing.
Image-text robustness is one of the most important tasks in image processing.
Image-text robustness is one of the most important tasks in image processing.
Image-text robustness is one of the most important tasks in image processing.
Authors
Hieu Pham, Zihang Dai, Golnaz Ghiasi, Hanxiao Liu, Adams Wei Yu, Minh-Thang Luong, Mingxing Tan, Quoc V. Le