Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate combined honest trees + isotonic calibration #530

Open
rflperry opened this issue Dec 31, 2021 · 8 comments
Open

Investigate combined honest trees + isotonic calibration #530

rflperry opened this issue Dec 31, 2021 · 8 comments
Assignees
Labels
ndd Neuro Data Design sklearn will try to merge into sklearn

Comments

@rflperry
Copy link
Member

Background

Honest decision trees build upon conventional decision trees by splitting the samples into two sets: one for learning the decision tree structure and the other for learning the classification posterior probabilities. In practice, this provides better calibration (i.e. the estimated probabilities are closer to the true probabilities). See this paper for details.

The code and experiments for the above paper are located in a fork of ProgLearn. The minimum working code and tutorial is seen in this notebook. This code is separate from the honest tree code used in ProgLearn as there is no need for transfer/lifelong learning. As an upside, the code has been optimized for maximum efficiency and benchmarked.

Request

An issue was made in sklearn and the simulations and paper attracted developer interest. The paper explored the performance of honest decision forests against the traditional forest as well as two other calibration methods, sigmoid and isotonic. A developer expressed interest in the results of combining honest trees with isotonic calibration given that isotonic calibration seems to do better than just honest posteriors. The request is thus to run the simulations and cc18 experiments from the paper with the added honest + isotonic forest method to see if this combined approach gives better calibration results than either approach alone.

Proposed Workflow

As the current honest forest code and experiments lie on a fork, it may be worthwhile to first create a new repository for just the optimized honest forest code and experiments as a separate entity from proglearn. Either way, the rough workflow would be:

  1. Write an HonestTreeClassifier class and then rewrite the UncertaintyForest class to use the new honest trees. Consider renaming UncertaintyForest to HonestForestClassifier for consistency. Currently there is no honest decision tree code, rather UncertaintyForest builds an ensemble of honest decision trees using regular decision trees.
  2. Verify that this honest decision tree can be used as the base estimator for the sklearn isotonic calibration just like the regular sklearn decision tree can be. This may require editing the honest tree class to conform to sklearn specific needs. This is probably the hardest step.
  3. Rerun the overlapping Gaussian simulation using this method too and determine the results.
  4. If the method seems promising, run on the real cc18 data experiments.
@rflperry rflperry added ndd Neuro Data Design sklearn will try to merge into sklearn labels Dec 31, 2021
@rjewang

This comment has been minimized.

@PSSF23

This comment has been minimized.

@rflperry

This comment has been minimized.

@jzheng17
Copy link

I'll take this issue

@PSSF23
Copy link
Member

PSSF23 commented Feb 17, 2022

@jzheng17 great! You can ask @rflperry if you have questions about the intended implementations.

@rflperry
Copy link
Member Author

@jzheng17 Per the first point, I've made a new repository in the group in which to move the existing methods and add new features. Another student will also be working out of that repo. I'm also available on the slack.

@jzheng17
Copy link

Thanks for letting me know. I'll take a look at the repo.

@rflperry
Copy link
Member Author

@jzheng17 As you can see from this issue being mentioned in the other repostiory, I've updated the honest-forests package. it now includes all the code for honest decision trees and honest forests as well as a couple small simulations from the proglearn package to verify results. Instructions to install and contribute are on the github page https://github.com/neurodata/honest-forests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ndd Neuro Data Design sklearn will try to merge into sklearn
Projects
None yet
Development

No branches or pull requests

4 participants