Pitfalls in Machine Learning Research: Reexamining the Development Cycle
Machine learning has the potential to fuel further advances in data science,
but it is greatly hindered by an ad hoc design process, poor data hygiene, and
a lack of statistical rigor in model evaluation. Recently, these issues have
begun to attract more attention as they have caused public and embarrassing
issues in research and development. Drawing from our experience as machine
learning researchers, we follow the machine learning process from algorithm
design to data collection to model evaluation, drawing attention to common
pitfalls and providing practical recommendations for improvements. At each
step, case studies are introduced to highlight how these pitfalls occur in
practice, and where things could be improved.