A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios
Current developments in natural language processing offer challenges and
opportunities for low-resource languages and domains. Deep neural networks are
known for requiring large amounts of training data which might not be available
in resource-lean scenarios. However, there is also a growing body of works to
improve the performance in low-resource settings. Motivated by fundamental
changes towards neural models and the currently popular pre-train and fine-tune
paradigm, we give an overview of promising approaches for low-resource natural
language processing. After a discussion about the definition of low-resource
scenarios and the different dimensions of data availability, we then examine
methods that enable learning when training data is sparse. This includes
mechanisms to create additional labeled data like data augmentation and distant
supervision as well as transfer learning settings that reduce the need for
target supervision. The survey closes with a brief look into methods suggested
in non-NLP machine learning communities, which might be beneficial for NLP in
low-resource scenarios
Authors
Michael A. Hedderich, Lukas Lange, Heike Adel, Jannik Strötgen, Dietrich Klakow