-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Blog post on predict_proba
#152
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
--- | ||
#### Blog Post Template #### | ||
|
||
#### Post Information #### | ||
# title: "What we should and should not expect from `predict_proba`" | ||
title: "What to expect from `predict_proba`" | ||
date: December 1, 2022 | ||
|
||
#### Post Category and Tags #### | ||
# Format in titlecase without dashes (Ex. "Open Source" instead of "open-source") | ||
categories: | ||
- Updates | ||
tags: | ||
- Machine Learning | ||
|
||
#### Featured Image #### | ||
featured-image: sorting.png | ||
|
||
#### Author Info #### | ||
# Can accomodate multiple authors | ||
# Add SQUARE Author Image to /assets/images/author_images/ folder | ||
postauthors: | ||
- name: Alexandre Perez-Lebel | ||
website: https://perez-lebel.com | ||
email: alexandre.perez@inria.fr | ||
image: alexandre_perez.jpeg | ||
usemathjax: true | ||
--- | ||
<div> | ||
<img src="/assets/images/posts_images/{{ page.featured-image }}" alt="" title="Image by storyset on Freepik"> | ||
{% include postauthor.html %} | ||
</div> | ||
|
||
In classification, many situations call for estimated probabilities beyond class labels. For example in decision making, cost sensitive learning or causal inference. | ||
These probability estimates are typically accessible from the `predict_proba` method of scikit-learn's classifiers. | ||
|
||
However, the quality of the estimated probabilities must be validated to provide trustworthiness, ensure fairness and robustness to operating conditions. | ||
To be reliable, the estimated probabilities must be close to the true underlying posterior probabilities of the classes `P(Y=1|X)`. | ||
|
||
Similarly to validating a discriminant classifier through accuracy or ROC curves, tools have been developed to evaluate a probabilistic classifier. | ||
Calibration is one of them [1-4]. Calibration is used as a proxy to evaluate the closeness of the estimated probabilities to the true ones. Many recalibration techniques have been developed to improve the estimated probabilities (see [scikit-learn's user guide on calibration](https://scikit-learn.org/stable/modules/calibration.html)). Estimated probabilities of a calibrated classifier can be interpreted as probability of correctness on population of same estimated probability, but not as the true posterior class probability. | ||
|
||
Indeed, it is important to highlight that calibration only captures part of the error on the estimated probabilities. The remaining term is the grouping loss [5]. Together, the calibration and grouping losses fully characterize the error on the estimated probabilities, the epistemic loss. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are too formal. Give us the intuitions of why calibration is not the full story, rather than the maths. |
||
|
||
$$\text{Epistemic loss} = \text{Calibration loss} + \text{Grouping loss}$$ | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First mention Brier score for model selection. Later mention grouping loss. |
||
However, estimating the grouping loss is a harder problem than calibration as its estimation involves directly the true probabilities. Recent work have focused on approximating the grouping loss through local estimations of the true probabilities [6]. | ||
|
||
When working with scikit-learn's classifiers, users must be equally as cautious on results obtained from `predict_proba` as on results from `predict`. Both output estimated quantities (probabilities and labels respectively) with no prior guarantees on their quality. In both cases, model's quality must be assessed with appropriate metrics: expected calibration error, brier score, accuracy, AUC. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Put links to relevant pages in the scikit-learn documentation |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe mention quickly (with a separate section title) recalibration (and link to corresponding docs) |
||
## References | ||
|
||
<style> | ||
ol > li::marker { | ||
content: "[" counter(list-item) "]\2003"; | ||
} | ||
</style> | ||
|
||
<ol> | ||
<li>Platt, J. C. (1999). Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. ADVANCES IN LARGE MARGIN CLASSIFIERS, 61--74.</li> | ||
<li>Zadrozny, B., & Elkan, C. (2001). Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. Icml, 1, 609–616.</li> | ||
<li>Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. International Conference on Machine Learning, 1321–1330.</li> | ||
<li>Minderer, M., Djolonga, J., Romijnders, R., Hubis, F., Zhai, X., Houlsby, N., Tran, D., & Lucic, M. (2021). Revisiting the calibration of modern neural networks. Advances in Neural Information Processing Systems, 34.</li> | ||
<li>Kull, M., & Flach, P. (2015). Novel decompositions of proper scoring rules for classification: Score adjustment as precursor to calibration. Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 68–85.</li> | ||
<li>Perez-Lebel, A., Le Morvan, M., & Varoquaux, G. (2022). Beyond calibration: estimating the grouping loss of modern neural networks. arXiv. <a href="https://doi.org/10.48550/arXiv.2210.16315">https://doi.org/10.48550/arXiv.2210.16315</a></li> | ||
</ol> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you know, there are several points of view on what such a probability may mean (controlled average error rate versus controlled individual probabilities). Maybe it would be good by first explaining what these two mean.