Skip to content

djgarcia/RD2R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RD2R Ensemble

This method implements the RD2R ensemble algorithm. RD2R ensemble method is a distributed upgrade of the method present in [1]. The algorithm performs Random Discretization and Principal Components Analysis to the input data, then joins the results and trains a decision tree on it.

This software has been proved with five large real-world datasets such as:

Brief benchmark results:

  • We outperform the original proposal and Random Forest implementation in MLlib for all datasets.
  • For epsilon dataset, we have outperformed the results of Random Forest by 5% less error with just 10 trees, compared to a Random Forest with up to 500 trees.

Example

import org.apache.spark.mllib.tree._

val nTrees = 10
val nBins = 5

// Data must be cached in order to improve the performance

val rd2rModel = RD2R.train(trainingData, // RDD[LabeledPoint]
                            nTrees, // size of the ensemble
                            nBins) // number of thresholds by feature

val predicted = rd2rModel.test(testData) // RDD[LabeledPoint]

References

[1] A. Ahmad and G. Brown, "Random Projection Random Discretization Ensembles - Ensembles of Linear Multivariate Decision Trees", Knowledge and Data Engineering, IEEE Transactions on, vol. 26, pp. 1225–1239, May 2014.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages