A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification
Black-box machine learning learning methods are now routinely used in
high-risk settings, like medical diagnostics, which demand uncertainty
quantification to avoid consequential model failures. Distribution-free
uncertainty quantification (distribution-free UQ) is a user-friendly paradigm
for creating statistically rigorous confidence intervals/sets for such
predictions. Critically, the intervals/sets are valid without distributional
assumptions or model assumptions, with explicit guarantees with finitely many
datapoints. Moreover, they adapt to the difficulty of the input; when the input
example is difficult, the uncertainty intervals/sets are large, signaling that
the model might be wrong. Without much work, one can use distribution-free
methods on any underlying algorithm, such as a neural network, to produce
confidence sets guaranteed to contain the ground truth with a user-specified
probability, such as 90%. Indeed, the methods are easy-to-understand and
general, applying to many modern prediction problems arising in the fields of
computer vision, natural language processing, deep reinforcement learning, and
so on. This hands-on introduction is aimed at a reader interested in the
practical implementation of distribution-free UQ, including conformal
prediction and related methods, who is not necessarily a statistician. We will
include many explanatory illustrations, examples, and code samples in Python,
with PyTorch syntax. The goal is to provide the reader a working understanding
of distribution-free UQ, allowing them to put confidence intervals on their
algorithms, with one self-contained document.