A Bootstrap Machine Learning Approach to Identify Rare Disease Patients from Electronic Health Records
Ravi Garg, Shu Dong, Sanjiv Shah, Siddhartha R Jonnalagadda
Rare diseases are very difficult to identify among large number of other
possible diagnoses. Better availability of patient data and improvement in
machine learning algorithms empower us to tackle this problem computationally.
In this paper, we target one such rare disease - cardiac amyloidosis. We aim to
automate the process of identifying potential cardiac amyloidosis patients with
the help of machine learning algorithms and also learn most predictive factors.
With the help of experienced cardiologists, we prepared a gold standard with 73
positive (cardiac amyloidosis) and 197 negative instances. We achieved high
average cross-validation F1 score of 0.98 using an ensemble machine learning
classifier. Some of the predictive variables were: Age and Diagnosis of cardiac
arrest, chest pain, congestive heart failure, hypertension, prim open angle
glaucoma, and shoulder arthritis. Further studies are needed to validate the
accuracy of the system across an entire health system and its generalizability
for other diseases.