A bin-and-hash algorithm for the efficient identification and comparison of large numbers of multidimensional vectors in the construction of machine learning potentials

A Bin and Hash Method for Analyzing Reference Data and Descriptors in Machine Learning Potentials

We present a bin-and-hash (bah) algorithm to facilitate the efficient identification and comparison of large numbers of multidimensional vectors that emerge in multiple contexts in the construction of machine learning (ml) potentials.The method is illustrated for the example of high-dimensional neural network potentials using atom-centered symmetry functions for the geometrical description of the atomic environments, but the method is general and can be combined with any current type of ml potential.