Skip to content

Commit

Permalink
resubmission v0.1.2 small documentation fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
chengjunhou committed Mar 7, 2019
1 parent 58b2aec commit 5e85a61
Show file tree
Hide file tree
Showing 8 changed files with 23 additions and 19 deletions.
15 changes: 8 additions & 7 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
Package: xgb2sql
Type: Package
Title: Convert trained XGBoost model to SQL query
Version: 0.1.1
Description: This tool enables in-database scoring of XGBoost models built in R by translating trained model objects into SQL query.
'XGBoost' <http://xgboost.readthedocs.io/en/latest/index.html> provides parallel tree boosting (also known as GBDT, GBM) algorithms
in a highly efficient, flexible and portable way.
Title: Convert Trained XGBoost Model to SQL Query
Version: 0.1.2
Description: This tool enables in-database scoring of XGBoost models built in R, by translating trained model objects into SQL query.
XGBoost <https://xgboost.readthedocs.io/en/latest/index.html> provides parallel tree boosting (also known as gradient boosting machine, or GBM) algorithms
in a highly efficient, flexible and portable way. GBM algorithm is introduced by Friedman (2001) <doi:10.1214/aos/1013203451>,
and more details on XGBoost can be found in Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>.
Authors@R: c(
person("Chengjun", "Hou", role = c("aut", "cre"), email = "chengjun.hou@gmail.com"),
person("Abhishek", "Bishoyi", role = c("aut"), email = "abhishek.bishoyi@gmail.com")
)
Author: Chengjun Hou [aut, cre], Abhishek Bishoyi [aut]
Maintainer: Chengjun Hou <chengjun.hou@gmail.com>
URL: http://github.com/chengjunhou/tree2sql
BugReports: http://github.com/chengjunhou/tree2sql/issues
URL: https://github.com/chengjunhou/xgb2sql
BugReports: https://github.com/chengjunhou/xgb2sql/issues
Depends: R (>= 3.1.0)
License: MIT + file LICENSE
Encoding: UTF-8
Expand Down
4 changes: 2 additions & 2 deletions R/booster2sql.R
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#' Transform XGBoost model object to SQL query
#' Transform XGBoost model object to SQL query.
#'
#' This function generates SQL query for in-database scoring of XGBoost models,
#' providing a robust and efficient way of model deployment. It takes in the trained XGBoost model \code{xgbModel},
#' name of the input database table \code{input_table_name},
#' and name of a unique identifier within that table \code{unique_id} as input,
#' writes the SQL query to a file specified by \code{output_file_name}.
#' Note that the input database table should be generated from the raw table using the one-hot encoding query output by \code{onehot2sql},
#' Note that the input database table should be generated from the raw table using the one-hot encoding query output by \code{onehot2sql()},
#' or to provide the one-hot encoding query as input \code{input_onehot_query} to this function, working as sub-query inside the final model scoring query.
#'
#' @param xgbModel The trained model object of class \code{xgb.Booster}.
Expand Down
2 changes: 1 addition & 1 deletion R/onehot2sql.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
#'
#' @param data Data object of class \code{data.frame} or \code{data.table}.
#' @param meta Optional, a list keeps track of all the transformation that has been taken on the categorical features.
#' @param sep Separation symbol between the categorical features and their levels, which will be the column names inside \code{data.mat}, default to "_".
#' @param sep Separation symbol between the categorical features and their levels, which will be the column names inside the output \code{model.matrix}, default to "_".
#' @param ws_replace Boolean indicator controls whether white-space and punctuation inside categorical feature levels should be replaced, default to TRUE.
#' @param ws_replace_with Replacing symbol, default to '' which means all white-space and punctuation should be removed.
#' @param unique_id A row unique identifier is crucial for in-database scoring of XGBoost model. If not given, SQL query will be generated with id name "ROW_KEY".
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ cat(readChar('xgb.txt', file.info('xgb.txt')$size))
Items under development are:
- Support for `booster="gblinear`.
- Support for other `objective`.
_ Support for customized loss function.
- Support for customized loss function.



Expand Down
9 changes: 6 additions & 3 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
# New Submission - Package xgb2sql
# New Package Submission - xgb2sql

## Resubmission
This is a resubmission. In this version I have:
* Revised URL of the CRAN page for a task view in canonical form
* Changed title case
* Added reference citation to Description for the related methods

## Test environments
* local Windows 8 x64, R 3.5.1, R devel
* Ubuntu 14.04.5 LTS (on travis-ci), R 3.5.2


## R CMD check results
There were no ERRORs, WARNINGs or NOTEs.


## Downstream dependencies
There are currently no downstream dependencies for this package.

4 changes: 2 additions & 2 deletions man/booster2sql.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/onehot2sql.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions vignettes/xgb2sql.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,12 @@ but sometimes the model needs to be integrated into some other systems that the
Plus moving large amount of data between database and R could be time and memory consuming.
So we propose R package `xgb2sql` enabling in-database scoring of XGBoost models built in R by translating trained model objects into SQL query.

[CRAN Task View: Model Deployment with R](http://cran.r-project.org/web/views/ModelDeployment.html)
[CRAN Task View: Model Deployment with R](https://CRAN.R-project.org/view=ModelDeployment)
categorizes the process of deploying models to various environments for scoring or inferencing on new data into two categories.
The first category is **Deployment through Different Types of Artifacts**, which basically means exporting the model as an object,
then using supported software/platform to consume this object scoring out the model predictions.
The other category is **Deployment through Cloud/Server**, which includes
a). providing an R interface to third-party managed services such as [Google Cloud Machine Learning Engine](http://cloud.google.com/ml-engine/);
a). providing an R interface to third-party managed services such as [Google Cloud Machine Learning Engine](https://cloud.google.com/ml-engine/);
b). turning R code into web API and opening service on the server.
Our approach provides SQL query producing model predictions, which can be taken as a combination of the model itself plus the scoring process.
The output SQL query can be treated as an artifact, but we can easily set up service for it on the database server.
Expand Down

0 comments on commit 5e85a61

Please sign in to comment.