A Case Study in Model Failure? COVID-19 Daily Deaths and ICU Bed Utilisation Predictions in New York State
Vincent Chin, Noelle I. Samia, Roman Marchant, Ori Rosen, John P.A. Ioannidis, Martin A. Tanner, Sally Cripps
Forecasting models have been influential in shaping decision-making in the
COVID-19 pandemic. However, there is concern that their predictions may have
been misleading. Here, we dissect the predictions made by four models for the
daily COVID-19 death counts between March 25 and June 5 in New York state, as
well as the predictions of ICU bed utilisation made by the influential IHME
model. We evaluated the accuracy of the point estimates and the accuracy of the
uncertainty estimates of the model predictions. First, we compared the "ground
truth" data sources on daily deaths against which these models were trained.
Three different data sources were used by these models, and these had
substantial differences in recorded daily death counts. Two additional data
sources that we examined also provided different death counts per day. For
accuracy of prediction, all models fared very poorly. Only 10.2% of the
predictions fell within 10% of their training ground truth, irrespective of
distance into the future. For accurate assessment of uncertainty, only one
model matched relatively well the nominal 95% coverage, but that model did not
start predictions until April 16, thus had no impact on early, major decisions.
For ICU bed utilisation, the IHME model was highly inaccurate; the point
estimates only started to match ground truth after the pandemic wave had
started to wane. We conclude that trustworthy models require trustworthy input
data to be trained upon. Moreover, models need to be subjected to prespecified
real time performance tests, before their results are provided to policy makers
and public health officials.