Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] deprecate lgb.prepare() and lgb.prepare2() #3095

Merged
merged 16 commits into from
Aug 1, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions R-package/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ export(lgb.Dataset.create.valid)
export(lgb.Dataset.save)
export(lgb.Dataset.set.categorical)
export(lgb.Dataset.set.reference)
export(lgb.convert)
export(lgb.convert_with_rules)
export(lgb.cv)
export(lgb.dump)
export(lgb.get.eval.result)
Expand Down
19 changes: 10 additions & 9 deletions R-package/R/lgb.prepare2.R → R-package/R/lgb.convert.R
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
#' @name lgb.prepare2
#' @name lgb.convert
#' @title Data preparator for LightGBM datasets (integer)
#' @description Attempts to prepare a clean dataset to prepare to put in a \code{lgb.Dataset}.
#' Factors and characters are converted to numeric (specifically: integer).
#' Please use \code{\link{lgb.prepare_rules2}} if you want to apply this transformation to
#' Factors and characters are converted to integer.
#' Please use \code{\link{lgb.convert_with_rules}} if you want to apply this transformation to
#' other datasets. This is useful if you have a specific need for integer dataset instead
#' of numeric dataset. Note that there are programs which do not support integer-only
#' input. Consider this as a half memory technique which is dangerous, especially for LightGBM.
#' of numeric dataset.
#'
#' NOTE: In previous releases of LightGBM, this function was called \code{lgb.prepare}.
#' @param data A data.frame or data.table to prepare.
#' @return The cleaned dataset. It must be converted to a matrix format (\code{as.matrix})
#' for input in \code{lgb.Dataset}.
Expand All @@ -16,13 +17,13 @@
#' str(iris)
#'
#' # Convert all factors/chars to integer
#' str(lgb.prepare2(data = iris))
#' str(lgb.convert(data = iris))
#'
#' \dontrun{
#' # When lightgbm package is installed, and you do not want to load it
#' # You can still use the function!
#' lgb.unloader()
#' str(lightgbm::lgb.prepare2(data = iris))
#' str(lightgbm::lgb.convert(data = iris))
#' # 'data.frame': 150 obs. of 5 variables:
#' # $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#' # $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
Expand All @@ -32,7 +33,7 @@
#' }
#'
#' @export
lgb.prepare2 <- function(data) {
lgb.convert <- function(data) {

# data.table not behaving like data.frame
if (inherits(data, "data.table")) {
Expand Down Expand Up @@ -75,7 +76,7 @@ lgb.prepare2 <- function(data) {
} else {

stop(
"lgb.prepare2: you provided "
"lgb.convert: you provided "
, paste(class(data), collapse = " & ")
, " but data should have class data.frame or data.table"
)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#' @name lgb.prepare_rules2
#' @name lgb.convert_with_rules
#' @title Data preparator for LightGBM datasets with rules (integer)
#' @description Attempts to prepare a clean dataset to prepare to put in a \code{lgb.Dataset}.
#' Factors and characters are converted to numeric (specifically: integer).
#' Factors and characters are converted to integer.
#' In addition, keeps rules created so you can convert other datasets using this converter.
#' This is useful if you have a specific need for integer dataset instead of numeric dataset.
#' Note that there are programs which do not support integer-only input.
#' Consider this as a half memory technique which is dangerous, especially for LightGBM.
#'
#' NOTE: In previous releases of LightGBM, this function was called \code{lgb.prepare_rules2}.
#' @param data A data.frame or data.table to prepare.
#' @param rules A set of rules from the data preparator, if already used.
#' @return A list with the cleaned dataset (\code{data}) and the rules (\code{rules}).
Expand All @@ -17,15 +17,15 @@
#'
#' str(iris)
#'
#' new_iris <- lgb.prepare_rules2(data = iris) # Autoconverter
#' new_iris <- lgb.convert_with_rules(data = iris) # Autoconverter
#' str(new_iris$data)
#'
#' data(iris) # Erase iris dataset
#' iris$Species[1L] <- "NEW FACTOR" # Introduce junk factor (NA)
#'
#' # Use conversion using known rules
#' # Unknown factors become 0, excellent for sparse datasets
#' newer_iris <- lgb.prepare_rules2(data = iris, rules = new_iris$rules)
#' newer_iris <- lgb.convert_with_rules(data = iris, rules = new_iris$rules)
#'
#' # Unknown factor is now zero, perfect for sparse datasets
#' newer_iris$data[1L, ] # Species became 0 as it is an unknown factor
Expand All @@ -46,12 +46,12 @@
#' , "virginica" = 1L
#' )
#' )
#' newest_iris <- lgb.prepare_rules2(data = iris, rules = personal_rules)
#' newest_iris <- lgb.convert_with_rules(data = iris, rules = personal_rules)
#' str(newest_iris$data) # SUCCESS!
#'
#' @importFrom data.table set
#' @export
lgb.prepare_rules2 <- function(data, rules = NULL) {
lgb.convert_with_rules <- function(data, rules = NULL) {

# data.table not behaving like data.frame
if (inherits(data, "data.table")) {
Expand Down Expand Up @@ -166,7 +166,7 @@ lgb.prepare_rules2 <- function(data, rules = NULL) {
} else {

stop(
"lgb.prepare_rules2: you provided "
"lgb.convert_with_rules: you provided "
, paste(class(data), collapse = " & ")
, " but data should have class data.frame"
)
Expand Down
85 changes: 0 additions & 85 deletions R-package/R/lgb.prepare.R

This file was deleted.

181 changes: 0 additions & 181 deletions R-package/R/lgb.prepare_rules.R

This file was deleted.

Loading