You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Should I expect that if I set min.node.size = 10, every leaf of every tree in a causal_forest should have at least 10 sample's? Or am misunderstanding what min.node.size is supposed to mean? I have found that setting positive values for min.node.size does not result in leaves with at least that many samples.
Here is an MWE inspired by the examples in the documentation:
library(grf)
library(purrr)
leafsamples <- function(t) {
leaves <-
t %>%
pluck("nodes") %>%
keep("is_leaf") %>%
map(pluck("samples"))
return(leaves)
}
n = 2000; p = 10
X = matrix(rnorm(n*p), n, p)
W = rbinom(n, 1, 0.5)
Y = pmax(X[,1], 0) * W + X[,2] + pmin(X[,3], 0) + rnorm(n)
tau.forest = causal_forest(X, Y, W, min.node.size = 4, seed = 1)
leaves <- leafsamples(get_tree(tau.forest, 1))
On my machine, only 53 of the 159 leaves in the first tree of this forest have 4 or more samples.
The text was updated successfully, but these errors were encountered:
My understanding was that min.node.size should constraint the minimum nb of units from treatment (resp. control) in terminal nodes, for a given tree. But indeed, as tcovert, causal_forest with min.node.size = 10L returns trees with terminal nodes containing sometimes only 1 sample !
Currently, we don't prevent splits from occurring that could result in nodes with size less than min.node.size. Instead, the algorithm simply stops splitting if a node's size is less than or equal to min.node.size. Our core splitting implementation is based on the ranger package (which is in turn based on Breiman + Cutler's randomForest package). Both packages make this approximation around min.node.size.
I agree that this behavior is quite misleading, and I've filed #143 to track the issue. For now, I'll add documentation to explain why there is a discrepancy in node sizes.
Should I expect that if I set
min.node.size = 10
, every leaf of every tree in acausal_forest
should have at least 10sample
's? Or am misunderstanding whatmin.node.size
is supposed to mean? I have found that setting positive values formin.node.size
does not result in leaves with at least that many samples.Here is an MWE inspired by the examples in the documentation:
On my machine, only 53 of the 159 leaves in the first tree of this forest have 4 or more samples.
The text was updated successfully, but these errors were encountered: