-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Each fold is trained for a fixed number of epochs. There is no early stopping based on validation c-index (results wouldn't be fair). #7
Comments
Hello, Jaume! Thanks for your impressive work. You mentioned that the results with early stopping would not be fair. I want to ask if there is evidence for this statement. I am looking for fair and universal ways to evaluate the survival models, since I find that the final observed (or reported) performance is sensitive to how we evaluate. Concretely, when doing 5-fold cross-validation with a fixed number of epochs ( Look forward to hearing from you side. |
I'm also very interested in this issue. What would be the fairest way to handle it? It seems that using a fixed epoch and an early stopping mechanism with a validation set may not yield the most reliable results. |
For someone who is also seeking fair means to configure and evaluate survival analysis models, there are some facts that could be helpful.
[1] Yu, C.-N., Greiner, R., Lin, H.-C., and Baracos, V. Learning patient-specific cancer survival distributions as a sequence |
Hi, Pei! I believe the current data splitting method has issues. Another drawback is that some datasets have very small sample sizes, and perhaps a few-shot approach might be more appropriate. |
Apologies for the late reply. Evaluating survival on very small cohorts is hard. All strategies have flaws. In SurvPath and following previous works, we didn't use a val set. The val set would be very small for many case. In addition, as we do site-stratified splits, the censorship distribution varies a lot of from a site to another. In our recent Patho-Bench evaluation, we opted for 50 monte carlo folds to mitigate this problem without hyperparemeter search nor validation set. See: https://github.com/mahmoodlab/Patho-Bench Overall, as a field, we should move away from TCGA survival analyses without additional external test sets, eg based on CPTAC or SurGen. Hope this helps. |
Originally posted by @guillaumejaume in #4 (comment)
The text was updated successfully, but these errors were encountered: