-
-
Notifications
You must be signed in to change notification settings - Fork 205
Algorithms
EdwardRaff edited this page Mar 31, 2015
·
3 revisions
JSAT as numerous algorithms available, some of which overlap / are related / building blocks for others. I'll attempt to provide a somewhat ordered list of the algorithms available in JSAT. I'll also provide the main reference for each method, but please see the JavaDocs for more details.
Some of these algorithms are more practical than others, and many are implemented mostly for comparison purposes. This allows one to do a comparison against many standard algorithms in the literature within one framework very easily, and no implementation will have an unfair advantage over the others by simply being implemented in a different language.
Other algorithms just simply make good benchmarks.
Algorithm | Relation / INFO | References |
---|---|---|
AutoDeskew | Attempts to remove any skew-ness from each column independently | |
Linear Transform | A simple linear rescaling of the features to a specific range | |
ZeroMeanTransform | A simple transform that removes the mean from all elements | |
PNormNormalization | re-scales the numeric features in data points by their p-norm | |
PolynomialTransform | generates new numeric features for polynomial interaction terms | |
MutualInfoFS | Feature selection using mutual information | |
Nominal To Numeric | Converts nominal features to numeric ones using a one-hot encoding | |
NumericalToHistogram | Converts numeric features to nomninal ones using a simple histogram | |
Bidirectional Search (BDS) | a greedy method of feature selection for prediction | |
plus-L minus-R Selection (LRS) | a greedy method of feature selection for prediction | |
Sequential Backward Selection (SBS) | a greedy method of feature selection for prediction | |
Sequential Forward Selection (SFS) | a greedy method of feature selection for prediction | |
Principle Component Analysis (PCA) | A common dimensionality reduction technique | |
FastICA | Hyvärinen, a., & Oja, E. (2000). Independent component analysis: algorithms and applications. Neural Networks, 13(4-5), 411–430. doi:10.1016/S0893-6080(00)00026-5 | |
Johnson-Lindenstrauss (JL) | A dimensionality reduction technique | Achlioptas, D. (2003). Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4), 671–687. doi:10.1016/S0022-0000(03)00025-4 |
WhitenedPCA | An extension of PCA that performs whitening, which decorelates all columns | |
WhitenedZCA | An extension of Whitened PCA that roates the feature space after whitening | |
Kernel PCA | A kernelized version of PCA | Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 10(5), 1299–1319. doi:10.1162/089976698300017467 |
Nystrom | A generic method of creating an approximate feature space for any kernel | Williams, C., & Seeger, M. (2001). Using the Nyström Method to Speed Up Kernel Machines. Advances in Neural Information Processing Systems 13 (pp. 682–688). MIT Press. Retrieved from here |
RFF RBF | Random Kitche Sinks for the RBF Kernel | Rahimi, A., & Recht, B. (2007). Random Features for Large-Scale Kernel Machines. Neural Information Processing Systems. |
ReliefF | A method of determining feature importance and selection for numeric features | Kononenko, I., Simec, E., & Robnik-Sikonja, M. (1997). Overcoming the myopia of inductive learning algorithms with RELIEFF. Applied Intelligence, 7, 39–55. |
Algorithm | Purpose | Relation / INFO | References |
---|---|---|---|
Logistic Regression | Classification | A simple implementation of Logistic Regression | |
Logistic Regression DCD | Classification | A fast implementation of Logistic Regression based on the DCD solver | Yu, H.-F., Huang, F.-L., & Lin, C.-J. (2010). Dual Coordinate Descent Methods for Logistic Regression and Maximum Entropy Models. Machine Learning, 85(1-2), 41–75. doi:10.1007/s10994-010-5221-8 |
StochasticMultinomialLogisticRegression | Classification | A direct implementation of Multinomial LR allowing for various priors | Carpenter, B. (2008). Lazy Sparse Stochastic Gradient Descent for Regularized Mutlinomial Logistic Regression. |
DCDs | Classification, Regression | A linear SVM trained by DCD | Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S. S., & Sundararajan, S. (2008). A Dual Coordinate Descent Method for Large-scale Linear SVM. Proceedings of the 25th international conference on Machine learning - ICML ’08 (pp. 408–415). New York, New York, USA: ACM Press. doi:10.1145/1390156.1390208 |
DCD | Classification, Regression | A linear SVM trained by DCD, without the shrinkage optimization | |
Linear Batch | Classification, Regression | A generic loss based solver that uses batch optimization methods, supports L2 regularization | |
Linear SGD | Classification, Regression | A generic loss based solver that uses stochastic online methods, supporting elastic net regularization | |
NewGLMNET | Classification | An implementation of Elastic Net regularized Logistic Regression | Yuan, G., Ho, C.-H., & Lin, C. (2012). An improved GLMNET for L1-regularized logistic regression. Journal of Machine Learning Research, 13, 1999–2030. doi:10.1145/2020408.2020421 |
Adaptive Regularization of Weight Vectors (AROW) | Classification, Online Learning | Crammer, K., Kulesza, A., & Dredze, M. (2013). Adaptive regularization of weight vectors. Machine Learning, 91(2), 155–187. doi:10.1007/s10994-013-5327-x | |
ALMA2 | Classification, Online Learning | Gentile, C. (2002). A New Approximate Maximal Margin Classification Algorithm. The Journal of Machine Learning Research, 2, 213–242. | |
ROMMA | Classification, Online Learning | Li, Y., & Long, P. M. (2002). The Relaxed Online Maximum Margin Algorithm. Machine Learning, 46(1-3), 361–387. doi:10.1023/A:1012435301888 | |
Soft Confidence-Weighted (SCW), | Classification, Online Learning | Crammer, K., Fern, M., & Pereira, O. (2008). Exact Convex Confidence-Weighted Learning. In Advances in Neural Information Processing Systems 22 (pp. 345–352). Retrieved from here | |
Passive Aggressive | Classification, Regression, Online Learning | Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–585. | |
Support Passive Aggressive | Classification, Online Learning | This is a multi-class generalization of Passive Aggressive | Matsushima, S., Shimizu, N., Yoshida, K., Ninomiya, T., & Nakagawa, H. (2010). Exact Passive-Aggressive Algorithm for Multiclass Classification Using Support Class. SIAM International Conference on Data Mining - SDM (pp. 303–314). Retrieved from here |
Pegasos | Classification, Online Learning | An online linear SVM | Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos : Primal Estimated sub-GrAdient SOlver for SVM. 24th international conference on Machine learning (pp. 807–814). New York, NY: ACM. doi:10.1145/1273496.1273598 |
Normal Herd (NHERD) | Classification, Online Learning | Its related to AROW and Passive Aggressive | Crammer, K., & Lee, D. D. (2010). Learning via Gaussian Herding. Pre-proceeding of NIPS (pp. 451–459). Retrieved from here |
Sparse Truncated Gradient Descent (STGD) | Classification, Regression, Online Learning | An online algorithm for L1 regularized models | Langford, J., Li, L., & Zhang, T. (2009). Sparse online learning via truncated gradient. The Journal of Machine Learning Research, 10, 777–801. |
RidgeRegression | Regression | A simple batch implementation of Ridge Regression |
Algorithm | Purpose | Relation / INFO | References |
---|---|---|---|
Platt's SMO | Classification, Regression | Platt, J. C. (1998). Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Advances in kernel methods (pp. 185 – 208). Retrieved from here | |
Kernel SGD | Classification, Regression, Online Learning | see KernelPoint building block. This provides a way to directly train a Kernel based model via SGD | |
Double Update Online Learning (DUOL) | Classification, Online Learning | This is a kernelized extension of the Passive Aggressive algorithm | Zhao, P., Hoi, S. C. H., & Jin, R. (2011). Double Updating Online Learning. Journal of Machine Learning Research, 12, 1587–1615. Retrieved from here |
Online Sparse Kernel Learning by Sampling and Smooth Losses (OSKL) | Classification, Online Learning | This is a bounded method for kernel learning meant for smooth loss functions like the Logistic Loss. | Zhang, L., Yi, J., Jin, R., Lin, M., & He, X. (2013). Online Kernel Learning with a Near Optimal Sparsity Bound. In S. Dasgupta & D. Mcallester (Eds.), Proceedings of the 30th International Conference on Machine Learning (ICML-13) (Vol. 28, pp. 621–629). JMLR Workshop and Conference Proceedings. |
PegasosK | Classification, Online Learning | A kernelized version of Pegasos | Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos : Primal Estimated sub-GrAdient SOlver for SVM. 24th international conference on Machine learning (pp. 807–814). New York, NY: ACM. doi:10.1145/1273496.1273598 |
Projectron | Classification, Online Learning | This is a kernalized version of the Perceptron that uses Projection as its method of bounding the number of SVs | Orabona, F., Keshet, J., & Caputo, B. (2009). Bounded Kernel-Based Online Learning. The Journal of Machine Learning Research, 10, 2643–2666. |
Bounded Online Gradient Descent (BOGD) | Classification, Online Learning | Zhao, P., Wang, J., Wu, P., Jin, R., & Hoi, S. C. H. (2012). Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning. In Proceedings of the 29th International Conference on Machine Learning (pp. 169–176). Learning; Machine Learning. Retrieved from here | |
Least Squares Support Vector Machine (LS-SVM) | Classification, Regression | This related to the Kernel SVM, but simpler to implement. Its equivalent to Kernel Ridge Regression, but trained differently - allowing for potentially larger data sets | Suykens, J., & Vandewalle, J. (1999). Least Squares Support Vector Machine Classifiers. Neural processing letters, 9(3), 293–298. doi:10.1023/A:1018628609742 |
Kernel Ridge Regression | Regression | This is a direct kernelization of the Ridge Regression algorithm | |
Kernel Recursive Least Squares (KernelRLS) | Regression, Online Learning | This is a kernelizaion of the RLS algorithm, and the first paper to use projection for bounded learning | Engel, Y., Mannor, S., & Meir, R. (2004). The Kernel Recursive Least-Squares Algorithm. IEEE Transactions on Signal Processing, 52(8), 2275–2285. doi:10.1109/TSP.2004.830985 |
Conservative Stochastic Kernel Logistic Regression (CSKLR) | Classification, Online Learning | Zhang, L., Jin, R., Chen, C., Bu, J., & He, X. (2012). Efficient Online Learning for Large-Scale Sparse Kernel Logistic Regression. Twenty-Sixth AAAI Conference on Artificial Intelligence (pp. 1219–1225). Retrieved from here |
Algorithm | Purpose | Relation / INFO | References |
---|---|---|---|
ID3 | Classification | A simple decision tree for only nominal features | |
Decision Tree | Classification, Regression | This is a generic implementation, allowing the ability to mimic the behavior of many tree algorithms like C4.5 and CART | |
Random Tree | Classification , Regression | This is a decision tree that choses a random subset of features at each iteration to consider | |
Random Forest | Classification, Regression | This is a collection of random trees | Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. Retrieved from here |
Extra Tree | Classification, Regression | This algorithm is an extremely randomized tree builder | Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees . Machine learning, 63(1), 3–42. doi:10.1007/s10994-006-6226-1 |
Extra Random Trees (ERTrees) | Classification, Regression | A collection of Extra Trees |
Algorithm | Purpose | Relation / INFO | References |
---|---|---|---|
Nearest Neighbor | Classification, Regression | ||
Discriminant Adaptive Nearest Neighbor. DANN | Classification, Regression | An extension of Nearest Neighbor that uses a region adaptive distance metric | Hastie, T., & Tibshirani, R. (1996). Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6), 607–616. doi:10.1109/34.506411 |
Learning Vector Quantization (LVQ) | Classification | ||
LVQ with Locally Learned Classifier (LVQ-LLC) | Classification | This is an origianl extension of LVQ | |
Self Organizing Map (SOM) | Classification, Clustering | ||
Locally Weighted Learner (LWL) | Classification, Regression | This algorithm builds a local model for every query, and uses that local model to make predictions | Atkeson, C., Moore, A., & Schaal, S. (1997). Locally Weighted Learning. Artificial intelligence review, 11–73 |
Algorithm | Purpose | Relation / INFO | References |
---|---|---|---|
AdaBoostM1 | Classification | The original boosting algorithm | |
AdaBoostM1PL | Classification | A parallel extension of AdaBoost | Scalable and Parallel Boosting with MapReduce, Indranil Palit and Chandan K. Reddy, IEEE Transactions on Knowledge and Data Engineering |
SAMME | Classification | An extension of AdaBoostM1 that works better on multi-class problems | Multi-class AdaBoost by Ji Zhu, Saharon Rosset, Hui Zou, & Trevor Hasstie |
Logit Boost | Classification | A boosting algorithm for classification that uses regressors as the weak learners | Special Invited Paper Additive Logistic Regression: A Statistical View of Boosting, By Jerome Friedman, Trevor Hastie and Robert Tibshirani. The Annals of Statistics 2000, Vol. 28, No. 2, 337–407 |
LogitBoostPL | Classification | A parallel extension of Logit Boost | Scalable and Parallel Boosting with MapReduce, Indranil Palit and Chandan K. Reddy, IEEE Transactions on Knowledge and Data Engineering |
Modest AdaBoost | Classification | An extension of AdaBoost that reduces overfitting | Vezhnevets, A., & Vezhnevets, V. (2005). “Modest AdaBoost” – Teaching AdaBoost to Generalize Better. GraphiCon. Novosibirsk Akademgorodok, Russia. Retrieved from here |
EmphasisBoost | Classification | A generalization of AdaBoos, with RealAdaBoost as a special case | Gómez-Verdejo, V., Ortega-Moral, M., Arenas-García, J., & Figueiras-Vidal, A. R. (2006). Boosting by weighting critical and erroneous samples. Neurocomputing, 69(7-9), 679–685. doi:10.1016/j.neucom.2005.12.011 |
Algorithm | Purpose | Relation / INFO | References |
---|---|---|---|
Bagging | Classification, Regression | An ensembling technique for reducing variance | Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. doi:10.1007/BF00058655 |
Wagging | Classification, Regression | An ensembling technique for models that support weighted data instances | Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 38(1998), 1–38. |
Arc-X4 | Classification | A simple ensemble technique for models that support weighted data instances | Breiman, L. (1998). Arcing Classifiers. The Annals of Statistics, 26(3), 801–824. |
Stacking | Classification, Regression | Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259. | |
UpdatableStacking | Classification, Regression, OnlineLearning | A generalization of Stacking for online models | Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259. |
OneVsAll | Classification | Extends binary classifiers to multi-class ones | |
OneVsOne | Classification | Extends binary classifiers to multi-class ones | |
Decision Directed Acyclic Graph (DDAG) | Classification | Extends binary classifiers to multi-class ones, and is related to OneVsOne | Platt, J. C., Cristianini, N., & Shawe-Taylor, J. (2000). Large margin DAGs for multiclass classification. In Advances in Neural Information Processing Systems (pp. 547–553). MIT Press. Retrieved from http://www.weizmann.ac.il/mathusers/bagon/CVspring07/files/DAGSVM.pdf |