Keep Up With Latest Trending Papers. Computer Science, AI and more.
Stats Machine Learning
Sparsely-gated Expert Networks for Computer Vision
Scaling Vision with Sparse Mixture of Experts
We present a sparse version of the visiontransformer that matches the performance of state-of-the-art networks, while requiring as little as half of the compute at test-time.
We also propose an extension to the routing algorithm that can prioritize subsets of each input across the entire batch, leading to adaptive per-image compute.
Finally, we demonstrate the potential of this sparse version of the visiontransformer to scale vision models, and train a 15b parameter model that attains 90.35% on imagenet.
Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, Neil Houlsby
Scale vision models
Sparsely-gated mixture of experts networks
15b parameter model
Read the Paper
◐ Recommended Follows
Mary Anne Smart
◐ Latest Activity