Exploiting Transliterated Words for Finding Similarity in Inter-Language News Articles using Machine Learning
Sameea Naeem, Dr. Arif ur Rahman, Syed Mujtaba Haider, Abdul Basit Mughal
Finding similarities between two inter-language news articles is a
challenging problem of Natural Language Processing (NLP). It is difficult to
find similar news articles in a different language other than the native
language of user, there is a need for a Machine Learning based automatic system
to find the similarity between two inter-language news articles. In this
article, we propose a Machine Learning model with the combination of English
Urdu word transliteration which will show whether the English news article is
similar to the Urdu news article or not. The existing approaches to find
similarities has a major drawback when the archives contain articles of
low-resourced languages like Urdu along with English news article. The existing
approaches to find similarities has drawback when the archives contain
low-resourced languages like Urdu along with English news articles. We used
lexicon to link Urdu and English news articles. As Urdu language processing
applications like machine translation, text to speech, etc are unable to handle
English text at the same time so this research proposed technique to find
similarities in English and Urdu news articles based on transliteration.