A framework for reading analog clocks in natural images or videos
It's About Time: Analog Clock Reading in the Wild
Analog clocks are ubiquitous in natural images or videos.However, reading analog clocks in natural images or videos is still a challenging task.Specifically, we make the following contributions : first, we create a scalable pipeline for generating synthetic clocks, significantly reducing the requirements for the labour-intensive annotations ; second, we introduce a clock recognition architecture based on spatial transformer networks (stn), which is trained end-to-end for clock alignment and recognition.We show that the model trained on the proposed synthetic dataset generalises towards real clocks with good accuracy, advocating a sim2real training regime.Third, to further reduce the gap between simulation and real data, we leverage the special property of time, i.e.Uniformity, to generate pseudo-labels on real unlabelled clock videos, and show that training on these videos offers further improvements while still requiring zero manual annotations.Finally, we introduce three benchmark datasets based on coco, openimages, and the clock movie, totalling 4,472 images with clocks, with fullannotations for time, accurate to the minute.