Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that min.node.size is strictly enforced. #143

Closed
jtibshirani opened this issue Nov 27, 2017 · 4 comments
Closed

Ensure that min.node.size is strictly enforced. #143

jtibshirani opened this issue Nov 27, 2017 · 4 comments

Comments

@jtibshirani
Copy link
Member

jtibshirani commented Nov 27, 2017

Currently, we don't prevent splits from occurring that could result in nodes with size less than min.node.size. Instead, the algorithm simply stops splitting if a node's size is less than or equal to min.node.size.

Although this is also the current behavior in both ranger and the original randomForest package, it's not intuitive and can be quite misleading. We should look into updating the splitting algorithm so that min.node.size is enforced as a true minimum.

@jtibshirani jtibshirani changed the title Ensure the min.node.size is strictly respected. Ensure the min.node.size is strictly enforced. Nov 27, 2017
@jtibshirani jtibshirani changed the title Ensure the min.node.size is strictly enforced. Ensure that min.node.size is strictly enforced. Nov 27, 2017
@swager
Copy link
Member

swager commented Feb 6, 2018

Just to add a little more context, the behavior of the randomForest package is discussed here.

We still intend to update the behavior of our implementation to properly enforce min.node.size as a true minimum node size.

@jtibshirani jtibshirani added bug help wanted Community members are welcome to submit a pull request to address the issue. labels May 28, 2018
@ras44
Copy link
Contributor

ras44 commented Aug 26, 2019

Hi @jtibshirani I'd be happy to continue contributing via PRs to grf. I was thinking either this one or any of the below:

  • 420 cobalt style balance plots
  • 291 classification forests

... or any other suggestions you or the team may have. Thanks!

@swager
Copy link
Member

swager commented Aug 26, 2019

Thanks for reaching out, @ras44! Let's start with #420 (and move the conversation there). If you're interested in doing this, could you please prepare a short document to start the conversation about how you envision the feature (in terms of scope, what kinds of functionality we should have, what types of forests the function can be applied to, if / how we should set up a cobalt dependency, etc.)?

@jtibshirani jtibshirani removed the help wanted Community members are welcome to submit a pull request to address the issue. label Aug 21, 2020
@halflearned halflearned removed the bug label Jun 25, 2021
@halflearned
Copy link
Member

The implementation of min.node.size in grf is now documented in the REFERENCE. See min.node.size and Selecting Balanced Splits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants