Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] [docs] Simplified examples to cut example run time (fixes #2988) #2989

Merged
merged 3 commits into from
Apr 15, 2020

Conversation

jameslamb
Copy link
Collaborator

See #2988 . The R package is very close to hitting this R CMD CHECK NOTE:

* checking examples ... NOTE
Examples with CPU or elapsed time > 5s
     user system elapsed
dim 5.426  0.264    5.79

In this PR, I cut the runtime for our examples by:

  • removing all library(lightgbm) (these are not necessary and it is conventional to omit them)
  • cutting nrounds from 10L to smaller numbers in all examples with training
  • wrapping examples for code that relies on the filesystem in \donttest
  • wrapping plotting code in \donttest

These tests run in about 2.9 seconds on my laptop. To get the timings for yourself

pushd R-package
    Rscript -e "roxygen2::roxygenize(load = 'installed')"
popd

Rscript build_r.R --skip-install

R CMD CHECK lightgbm*.tar.gz \
    --no-tests \
    --no-manual \
    --timings

Rscript -e "
    timing_df <- read.delim('lightgbm.Rcheck/lightgbm-Ex.timings');
    print(timing_df);
    print(paste0('total time (s) ', sum(timing_df[['name']])))
    "

Results:

                             name  user system elapsed
dim                         0.705 0.038  0.746      NA
dimnames.lgb.Dataset        0.049 0.006  0.046      NA
getinfo                     0.061 0.004  0.057      NA
lgb.Dataset                 0.050 0.008  0.050      NA
lgb.Dataset.construct       0.044 0.002  0.037      NA
lgb.Dataset.create.valid    0.035 0.003  0.037      NA
lgb.Dataset.save            0.042 0.003  0.036      NA
lgb.Dataset.set.categorical 0.043 0.002  0.036      NA
lgb.Dataset.set.reference   0.034 0.003  0.038      NA
lgb.cv                      0.287 0.078  0.203      NA
lgb.dump                    0.001 0.000  0.000      NA
lgb.get.eval.result         0.118 0.017  0.063      NA
lgb.importance              0.167 0.018  0.137      NA
lgb.interprete              0.409 0.024  0.216      NA
lgb.load                    0.000 0.000  0.000      NA
lgb.model.dt.tree           0.251 0.040  0.145      NA
lgb.plot.importance         0.146 0.021  0.108      NA
lgb.plot.interpretation     0.000 0.000  0.000      NA
lgb.prepare                 0.017 0.001  0.009      NA
lgb.prepare2                0.011 0.001  0.006      NA
lgb.prepare_rules           0.028 0.002  0.015      NA
lgb.prepare_rules2          0.023 0.002  0.013      NA
lgb.save                    0.000 0.000  0.000      NA
lgb.train                   0.119 0.017  0.066      NA
lgb.unloader                0.117 0.017  0.062      NA
predict.lgb.Booster         0.081 0.015  0.063      NA
readRDS.lgb.Booster         0.000 0.000  0.000      NA
saveRDS.lgb.Booster         0.000 0.000  0.000      NA
setinfo                     0.052 0.002  0.046      NA
slice                       0.054 0.003  0.048      NA
[1] "total time (s) 2.944"

To answer some questions I anticipate getting:

Should we really be using \donttest? That means CI won't catch those examples being broken

I think it's worth it to get rid of the R CMD CHECK note. We test all those functions in unit tests, so if we break the functions themselves we'll know about it.

Won't using nrounds = 5L and other parameter changes result in terrible models?

Yes probably. But we don't have strong guarantees about how well the models from our example code work anyway. I think the role of examples here is just to show minimal valid sets of arguments to each function. See #2988 (comment) for my full opinion on this.

@jameslamb
Copy link
Collaborator Author

@Laurae2 or @guolinke could you please take a look whenever you have time? This NOTE is blocking #2985 and even if we work around it there it's going to block other PRs (since the runtime right now is so close to 5 seconds)

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Apr 10, 2020

Should we really be using \donttest? That means CI won't catch those examples being broken

Sorry for maybe stupid question, but I'm still far away from R and learning it only by our conversations here 😃

As we do not wrap them into \dontrun (but in \donttest), it means that these examples are will be running at our RTD site and failing docs will indirectly indicate that our examples may be broken, right?

LightGBM/build_r_site.R

Lines 12 to 19 in 44a9120

build_reference(
lazy = FALSE
, document = FALSE
, examples = TRUE
, run_dont_run = FALSE
, seed = 42L
, preview = FALSE
)

@jameslamb jameslamb changed the title [R-package] [docs] Simplified examles to cut example run time (fixes #2988) [R-package] [docs] Simplified examples to cut example run time (fixes #2988) Apr 13, 2020
Copy link
Contributor

@Laurae2 Laurae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the learning_rate to adapt to the hyperparameter changes

R-package/R/lgb.cv.R Outdated Show resolved Hide resolved
R-package/R/lgb.plot.interpretation.R Outdated Show resolved Hide resolved
R-package/R/lgb.plot.importance.R Outdated Show resolved Hide resolved
R-package/R/lgb.interprete.R Outdated Show resolved Hide resolved
R-package/R/lgb.importance.R Outdated Show resolved Hide resolved
@Laurae2
Copy link
Contributor

Laurae2 commented Apr 13, 2020

As we do not wrap them into \dontrun (but in \donttest), it means that these examples are will be running at our RTD site and failing docs will indirectly indicate that our examples may be broken, right?

@StrikerRUS The examples would fail silently (but still show the error in doc).

@jameslamb
Copy link
Collaborator Author

jameslamb commented Apr 13, 2020

As we do not wrap them into \dontrun (but in \donttest), it means that these examples are will be running at our RTD site and failing docs will indirectly indicate that our examples may be broken, right?

@StrikerRUS The examples would fail silently (but still show the error in doc).

I thought that if you added \dontrun, it will literally add the comment "## don't run" in the examples which I think would be confusing. I swear it used to work like that. But I just tested and I guess not?

image

@Laurae2 is right (#2989 (comment)). I tried adding stop("I am an error") to the examples in lgb.cv() and then re-running pkgdown::build_site().

Results when that example is wrapped in \donttest

image

Results with \dontrun

image

In which case I think the \donttest is still an ok choice. I'd rather that we have an error message that someone sees and alerts us to than just having the code be silently broken until someone runs it. I'd hate to have people run that code, get an error, and get discouraged and think they're doing something wrong.

@jameslamb
Copy link
Collaborator Author

Change the learning_rate to adapt to the hyperparameter changes

Thanks! Just made these changes

@StrikerRUS StrikerRUS merged commit 1816167 into microsoft:master Apr 15, 2020
@StrikerRUS
Copy link
Collaborator

Not sure that the root cause is this PR, but we have the following right now at our docs:

image

https://lightgbm.readthedocs.io/en/latest/R/reference/lgb.cv.html

@StrikerRUS
Copy link
Collaborator

Also, it seems that two other examples are broken:

https://lightgbm.readthedocs.io/en/latest/R/reference/lgb.Dataset.set.categorical.html
https://lightgbm.readthedocs.io/en/latest/R/reference/lgb.plot.interpretation.html

@jameslamb
Copy link
Collaborator Author

Thanks @StrikerRUS ! I don't think they are related to this PR. I just created #2999 to address them though, so appreciate you pointing them out!

@jameslamb jameslamb deleted the misc/r-example-times branch April 25, 2020 17:07
@lock lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants