resubmission v0.1.2 small documentation fixes

chengjunhou · Mar 7, 2019 · 5e85a61 · 5e85a61
1 parent 58b2aec
commit 5e85a61
Show file tree

Hide file tree

Showing 8 changed files with 23 additions and 19 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,18 +1,19 @@
 Package: xgb2sql
 Type: Package
-Title: Convert trained XGBoost model to SQL query
-Version: 0.1.1
-Description: This tool enables in-database scoring of XGBoost models built in R by translating trained model objects into SQL query. 
-  'XGBoost' <http://xgboost.readthedocs.io/en/latest/index.html> provides parallel tree boosting (also known as GBDT, GBM) algorithms
-  in a highly efficient, flexible and portable way.
+Title: Convert Trained XGBoost Model to SQL Query
+Version: 0.1.2
+Description: This tool enables in-database scoring of XGBoost models built in R, by translating trained model objects into SQL query. 
+  XGBoost <https://xgboost.readthedocs.io/en/latest/index.html> provides parallel tree boosting (also known as gradient boosting machine, or GBM) algorithms
+  in a highly efficient, flexible and portable way. GBM algorithm is introduced by Friedman (2001) <doi:10.1214/aos/1013203451>, 
+  and more details on XGBoost can be found in Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>.
 Authors@R: c(
   person("Chengjun", "Hou", role = c("aut", "cre"), email = "chengjun.hou@gmail.com"),
   person("Abhishek", "Bishoyi", role = c("aut"), email = "abhishek.bishoyi@gmail.com")
   )
 Author: Chengjun Hou [aut, cre], Abhishek Bishoyi [aut]
 Maintainer: Chengjun Hou <chengjun.hou@gmail.com>
-URL: http://github.com/chengjunhou/tree2sql
-BugReports: http://github.com/chengjunhou/tree2sql/issues
+URL: https://github.com/chengjunhou/xgb2sql
+BugReports: https://github.com/chengjunhou/xgb2sql/issues
 Depends: R (>= 3.1.0)
 License: MIT + file LICENSE
 Encoding: UTF-8

diff --git a/R/booster2sql.R b/R/booster2sql.R
@@ -1,11 +1,11 @@
-#' Transform XGBoost model object to SQL query
+#' Transform XGBoost model object to SQL query.
 #'
 #' This function generates SQL query for in-database scoring of XGBoost models,
 #' providing a robust and efficient way of model deployment. It takes in the trained XGBoost model \code{xgbModel},
 #' name of the input database table \code{input_table_name},
 #' and name of a unique identifier within that table \code{unique_id} as input,
 #' writes the SQL query to a file specified by \code{output_file_name}.
-#' Note that the input database table should be generated from the raw table using the one-hot encoding query output by \code{onehot2sql},
+#' Note that the input database table should be generated from the raw table using the one-hot encoding query output by \code{onehot2sql()},
 #' or to provide the one-hot encoding query as input \code{input_onehot_query} to this function, working as sub-query inside the final model scoring query.
 #'
 #' @param xgbModel The trained model object of class \code{xgb.Booster}.

diff --git a/R/onehot2sql.R b/R/onehot2sql.R
@@ -12,7 +12,7 @@
 #'
 #' @param data Data object of class \code{data.frame} or \code{data.table}.
 #' @param meta Optional, a list keeps track of all the transformation that has been taken on the categorical features.
-#' @param sep Separation symbol between the categorical features and their levels, which will be the column names inside \code{data.mat}, default to "_".
+#' @param sep Separation symbol between the categorical features and their levels, which will be the column names inside the output \code{model.matrix}, default to "_".
 #' @param ws_replace Boolean indicator controls whether white-space and punctuation inside categorical feature levels should be replaced, default to TRUE.
 #' @param ws_replace_with Replacing symbol, default to '' which means all white-space and punctuation should be removed.
 #' @param unique_id  A row unique identifier is crucial for in-database scoring of XGBoost model. If not given, SQL query will be generated with id name "ROW_KEY".

diff --git a/README.md b/README.md
@@ -197,7 +197,7 @@ cat(readChar('xgb.txt', file.info('xgb.txt')$size))
 Items under development are:
 - Support for `booster="gblinear`.
 - Support for other `objective`.
-_ Support for customized loss function.
+- Support for customized loss function.
 
 
 

diff --git a/cran-comments.md b/cran-comments.md
@@ -1,15 +1,18 @@
-# New Submission - Package xgb2sql
+# New Package Submission - xgb2sql
 
+## Resubmission
+This is a resubmission. In this version I have:
+* Revised URL of the CRAN page for a task view in canonical form
+* Changed title case
+* Added reference citation to Description for the related methods
 
 ## Test environments
 * local Windows 8 x64, R 3.5.1, R devel
 * Ubuntu 14.04.5 LTS (on travis-ci), R 3.5.2
 
-
 ## R CMD check results
 There were no ERRORs, WARNINGs or NOTEs.
 
-
 ## Downstream dependencies
 There are currently no downstream dependencies for this package.
 
diff --git a/man/booster2sql.Rd b/man/booster2sql.Rd
diff --git a/man/onehot2sql.Rd b/man/onehot2sql.Rd
diff --git a/vignettes/xgb2sql.Rmd b/vignettes/xgb2sql.Rmd
@@ -25,12 +25,12 @@ but sometimes the model needs to be integrated into some other systems that the
 Plus moving large amount of data between database and R could be time and memory consuming. 
 So we propose R package `xgb2sql` enabling in-database scoring of XGBoost models built in R by translating trained model objects into SQL query.
 
-[CRAN Task View: Model Deployment with R](http://cran.r-project.org/web/views/ModelDeployment.html) 
+[CRAN Task View: Model Deployment with R](https://CRAN.R-project.org/view=ModelDeployment) 
 categorizes the process of deploying models to various environments for scoring or inferencing on new data into two categories.
 The first category is **Deployment through Different Types of Artifacts**, which basically means exporting the model as an object, 
 then using supported software/platform to consume this object scoring out the model predictions.
 The other category is **Deployment through Cloud/Server**, which includes 
-a). providing an R interface to third-party managed services such as [Google Cloud Machine Learning Engine](http://cloud.google.com/ml-engine/);
+a). providing an R interface to third-party managed services such as [Google Cloud Machine Learning Engine](https://cloud.google.com/ml-engine/);
 b). turning R code into web API and opening service on the server.
 Our approach provides SQL query producing model predictions, which can be taken as a combination of the model itself plus the scoring process.
 The output SQL query can be treated as an artifact, but we can easily set up service for it on the database server.