-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Blog post on predict_proba
#152
Conversation
These probability estimates are typically accessible from the `predict_proba` method of scikit-learn's classifiers. | ||
|
||
However, the quality of the estimated probabilities must be validated to provide trustworthiness, ensure fairness and robustness to operating conditions. | ||
To be reliable, the estimated probabilities must be close to the true underlying posterior probabilities of the classes `P(Y=1|X)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you know, there are several points of view on what such a probability may mean (controlled average error rate versus controlled individual probabilities). Maybe it would be good by first explaining what these two mean.
Similarly to validating a discriminant classifier through accuracy or ROC curves, tools have been developed to evaluate a probabilistic classifier. | ||
Calibration is one of them [1-4]. Calibration is used as a proxy to evaluate the closeness of the estimated probabilities to the true ones. Many recalibration techniques have been developed to improve the estimated probabilities (see [scikit-learn's user guide on calibration](https://scikit-learn.org/stable/modules/calibration.html)). Estimated probabilities of a calibrated classifier can be interpreted as probability of correctness on population of same estimated probability, but not as the true posterior class probability. | ||
|
||
Indeed, it is important to highlight that calibration only captures part of the error on the estimated probabilities. The remaining term is the grouping loss [5]. Together, the calibration and grouping losses fully characterize the error on the estimated probabilities, the epistemic loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are too formal. Give us the intuitions of why calibration is not the full story, rather than the maths.
Indeed, it is important to highlight that calibration only captures part of the error on the estimated probabilities. The remaining term is the grouping loss [5]. Together, the calibration and grouping losses fully characterize the error on the estimated probabilities, the epistemic loss. | ||
|
||
$$\text{Epistemic loss} = \text{Calibration loss} + \text{Grouping loss}$$ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First mention Brier score for model selection. Later mention grouping loss.
|
||
However, estimating the grouping loss is a harder problem than calibration as its estimation involves directly the true probabilities. Recent work have focused on approximating the grouping loss through local estimations of the true probabilities [6]. | ||
|
||
When working with scikit-learn's classifiers, users must be equally as cautious on results obtained from `predict_proba` as on results from `predict`. Both output estimated quantities (probabilities and labels respectively) with no prior guarantees on their quality. In both cases, model's quality must be assessed with appropriate metrics: expected calibration error, brier score, accuracy, AUC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put links to relevant pages in the scikit-learn documentation
However, estimating the grouping loss is a harder problem than calibration as its estimation involves directly the true probabilities. Recent work have focused on approximating the grouping loss through local estimations of the true probabilities [6]. | ||
|
||
When working with scikit-learn's classifiers, users must be equally as cautious on results obtained from `predict_proba` as on results from `predict`. Both output estimated quantities (probabilities and labels respectively) with no prior guarantees on their quality. In both cases, model's quality must be assessed with appropriate metrics: expected calibration error, brier score, accuracy, AUC. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention quickly (with a separate section title) recalibration (and link to corresponding docs)
@aperezlebel While I value the effort for such a blog post, I do not agree with some parts of its current content, e.g. the grouping loss. Unfortunately, I can‘t promise a fast review atm. |
@lorentzenchr I appreciate your feedback, thanks. Could you elaborate on the parts you disagree with? |
I have 3 main points of critique:
|
I did not mean to stop this blog post. It‘s now stalled for more than 1.5 years. So I close. Feel free to open again if you intend to finish it. |
Closes #147.
Work in progress.
TODO: