Conformal prediction uses a held-out, labeled set of examples to calibrate a classifier to yield confidence sets that include the true label with user-specified probability. But what happens if even experts disagree on the ground truth labels. Commonly, this is resolved by taking the majority voted label from multiple expert. However, in difficult and ambiguous tasks, the majority voted label might be misleading and a bad representation of the underlying true posterior distribution. In this paper, we introduce Monte Carlo conformal prediction which allows to perform conformal calibration directly against expert opinions or aggregate statistics thereof.
Abstract
Conformal Prediction (CP) allows to perform rigorous uncertainty quantification by constructing a prediction set $C(X)$ satisfying $mathbb{P}_{agg}(Y in C(X))geq 1-alpha$ for a user-chosen $alpha in [0,1]$ by relying on calibration data $(X_1,Y_1),…,(X_n,Y_n)$ from $mathbb{P}=mathbb{P}_{agg}^{X} otimes mathbb{P}_{agg}^{Y|X}$. It is typically implicitly assumed that $mathbb{P}_{agg}^{Y|X}$ is the “true” posterior label distribution. However, in many real-world scenarios, the labels $Y_1,…,Y_n$ are obtained by aggregating expert opinions using a voting procedure, resulting in a one-hot distribution $mathbb{P}_{vote}^{Y|X}$. This is the case for most datasets, even well-known ones like ImageNet. For such “voted” labels, CP guarantees are thus w.r.t. $mathbb{P}_{vote}=mathbb{P}_{agg}^X otimes mathbb{P}_{vote}^{Y|X}$ rather than the true distribution $mathbb{P}_{agg}$. In cases with unambiguous ground truth labels, the distinction between $mathbb{P}_{vote}$ and $mathbb{P}_{agg}$ is irrelevant. However, when experts do not agree because of ambiguous labels, approximating $mathbb{P}_{agg}^{Y|X}$ with a one-hot distribution $mathbb{P}_{vote}^{Y|X}$ ignores this uncertainty. In this paper, we propose to leverage expert opinions to approximate $mathbb{P}_{agg}^{Y|X}$ using a non-degenerate distribution $mathbb{P}_{agg}^{Y|X}$. We then develop emph{Monte Carlo CP} procedures which provide guarantees w.r.t. $mathbb{P}_{agg}=mathbb{P}_{agg}^X otimes mathbb{P}_{agg}^{Y|X}$ by sampling multiple synthetic pseudo-labels from $mathbb{P}_{agg}^{Y|X}$ for each calibration example $X_1,…,X_n$. In a case study of skin condition classification with significant disagreement among expert annotators, we show that applying CP w.r.t. $mathbb{P}_{vote}$ under-covers expert annotations: calibrated for $72%$ coverage, it falls short by on average $10%$; our Monte Carlo CP closes this gap both empirically and theoretically. We also extend Monte Carlo CP to multi-label classification and CP with calibration examples enriched through data augmentation.
Paper on OpenReview
Paper on ArXiv
@article{StutzTMLR2023, title={Conformal prediction under ambiguous ground truth}, author={David Stutz and Abhijit Guha Roy and Tatiana Matejovicova and Patricia Strachan and Ali Taylan Cemgil and Arnaud Doucet}, journal={Transactions on Machine Learning Research}, issn={2835-8856}, year={2023}, url={https://openreview.net/forum?id=CAd6V2qXxc}, }
Source link
lol