Accuracy and Performance Comparison of Video Action Recognition Approaches
Matthew Hutchinson, Siddharth Samsi, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Micheal Houle, Matthew Hubbell, Micheal Jones, Jeremy Kepner, Andrew Kirby, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Antonio Rosa, Albert Reuther, Charles Yee, Vijay Gadepally
Over the past few years, there has been significant interest in video action
recognition systems and models. However, direct comparison of accuracy and
computational performance results remain clouded by differing training
environments, hardware specifications, hyperparameters, pipelines, and
inference methods. This article provides a direct comparison between fourteen
off-the-shelf and state-of-the-art models by ensuring consistency in these
training characteristics in order to provide readers with a meaningful
comparison across different types of video action recognition algorithms.
Accuracy of the models is evaluated using standard Top-1 and Top-5 accuracy
metrics in addition to a proposed new accuracy metric. Additionally, we compare
computational performance of distributed training from two to sixty-four GPUs
on a state-of-the-art HPC system.