Extrapolating from a Single Image to a Thousand Classes using Distillation
We develop a framework for training neural networks from scratch using a single image by means of knowledge distillation from a supervisedly pretrained teacher.
With this, we find top-1 accuracies of 94%/74% on cifar-10/100, 59% on imagenet and, by extending this method to audio, 84% on speechcommands.
In extensive analyses we disentangle the effect of augmentations, choice of source image and network architectures and also discover"panda neurons"in networksthat have never seen a panda.