ABB-BERT: A BERT model for disambiguating abbreviations and contractions
Prateek Kacker, Andi Cupallari, Aswin Gridhar Subramanian, Nimit Jain
Abbreviations and contractions are commonly found in text across different
domains. For example, doctors' notes contain many contractions that can be
personalized based on their choices. Existing spelling correction models are
not suitable to handle expansions because of many reductions of characters in
words. In this work, we propose ABB-BERT, a BERT-based model, which deals with
an ambiguous language containing abbreviations and contractions. ABB-BERT can
rank them from thousands of options and is designed for scale. It is trained on
Wikipedia text, and the algorithm allows it to be fine-tuned with little
compute to get better performance for a domain or person. We are publicly
releasing the training dataset for abbreviations and contractions derived from
Wikipedia.