Skip to content

Commit

Permalink
ensuring num bins is always greater than max number of categories
Browse files Browse the repository at this point in the history
  • Loading branch information
manishamde committed Mar 12, 2014

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
1 parent 62c2562 commit 6068356
Showing 1 changed file with 9 additions and 1 deletion.
Original file line number Diff line number Diff line change
@@ -816,7 +816,15 @@ object DecisionTree extends Serializable with Logging {

val maxBins = strategy.maxBins
val numBins = if (maxBins <= count) maxBins else count.toInt
logDebug("maxBins = " + numBins)
logDebug("numBins = " + numBins)

// I will also add a require statement ensuring #bins is always greater than the categories
// It's a limitation of the current implementation but a reasonable tradeoff since features
// with large number of categories get favored over continuous features.
if (strategy.categoricalFeaturesInfo.size > 0){
val maxCategoriesForFeatures = strategy.categoricalFeaturesInfo.maxBy(_._2)._2
require(numBins >= maxCategoriesForFeatures)
}

// Calculate the number of sample for approximate quantile calculation
val requiredSamples = numBins*numBins

0 comments on commit 6068356

Please sign in to comment.