Caret2SQL is a SQL-based deployment system for R caret models. It relies on the existing framework in sklearn2sql.
Caret is a modular R system used to train and predict machine learning models in a standard way. It supports almost all known machine learning model types (its coverage is comparable to python scikit-learn)
Sklearn2sql provides a framework for translating scikit-learn predictive models into a SQL code for deployment purposes. Using this framework, for example, it is possible for a C, perl or java developper to deploy such a model simply by executing the generated SQL code. The system supports the major market databases.
Caret2SQL applies the same framework to R caret models, eventually by using a common JSON format with python version models (mapping caret models with equivalent scikit-learn models).
Some machine learning libraries (xgboost , lightGBM) already support exporting and loading models in a specific JSON format, which makes implementing these cases straightforward (python xgboost models are already supported in sklearn2sql).
library(caret, quiet = TRUE)
library(base64enc)
library(httr, quiet = TRUE)
## multiclass classification on iris dataset:
# build/train a caret model
model = train(Species ~ ., data = iris, method = "ctree2")
WS_URL = "https://sklearn2sql.herokuapp.com/model"
model_serialized <- serialize(model, NULL)
b64_data = base64encode(model_serialized)
data = list(Name = "ctree2_test_model", SerializedModel = b64_data , SQLDialect = "postgresql" , Mode="caret")
r = POST(WS_URL, body = data, encode = "json")
content = content(r)
lSQL = content$model$SQLGenrationResult[[1]]$SQL
cat(lSQL);
The most used models are now implemented in the web-service
- Classification models (GLMxx , naive bayes, decision trees (rpart + ctree + ctree2), SVMs , Neural Nets, Earth/MARS )
- Regressions (almost he same as above , except naive bayes)
- Preprocessings : "center", "scale", "pca"
- Ensembles : Boosting, Bagging, Random Forests, XGBoost.
Some R jupyter notebooks are provided as a demo.
A more complete list of supported models and generated SQL codes for various databases is available.
A good introduction for Caret is given by the author ( @topepo ) :
Kuhn, M. (2008), “Building predictive models in R using the caret package, ” Journal of Statistical Software, (http://www.jstatsoft.org/article/view/v028i05/v28i05.pdf).
Your feedback is welcome.