[R-package] [docs] Simplified examples to cut example run time (fixes #2988) #2989

jameslamb · 2020-04-10T02:28:18Z

See #2988 . The R package is very close to hitting this R CMD CHECK NOTE:

* checking examples ... NOTE
Examples with CPU or elapsed time > 5s
     user system elapsed
dim 5.426  0.264    5.79

In this PR, I cut the runtime for our examples by:

removing all library(lightgbm) (these are not necessary and it is conventional to omit them)
cutting nrounds from 10L to smaller numbers in all examples with training
wrapping examples for code that relies on the filesystem in \donttest
wrapping plotting code in \donttest

These tests run in about 2.9 seconds on my laptop. To get the timings for yourself

pushd R-package
    Rscript -e "roxygen2::roxygenize(load = 'installed')"
popd

Rscript build_r.R --skip-install

R CMD CHECK lightgbm*.tar.gz \
    --no-tests \
    --no-manual \
    --timings

Rscript -e "
    timing_df <- read.delim('lightgbm.Rcheck/lightgbm-Ex.timings');
    print(timing_df);
    print(paste0('total time (s) ', sum(timing_df[['name']])))
    "

Results:

                             name  user system elapsed
dim                         0.705 0.038  0.746      NA
dimnames.lgb.Dataset        0.049 0.006  0.046      NA
getinfo                     0.061 0.004  0.057      NA
lgb.Dataset                 0.050 0.008  0.050      NA
lgb.Dataset.construct       0.044 0.002  0.037      NA
lgb.Dataset.create.valid    0.035 0.003  0.037      NA
lgb.Dataset.save            0.042 0.003  0.036      NA
lgb.Dataset.set.categorical 0.043 0.002  0.036      NA
lgb.Dataset.set.reference   0.034 0.003  0.038      NA
lgb.cv                      0.287 0.078  0.203      NA
lgb.dump                    0.001 0.000  0.000      NA
lgb.get.eval.result         0.118 0.017  0.063      NA
lgb.importance              0.167 0.018  0.137      NA
lgb.interprete              0.409 0.024  0.216      NA
lgb.load                    0.000 0.000  0.000      NA
lgb.model.dt.tree           0.251 0.040  0.145      NA
lgb.plot.importance         0.146 0.021  0.108      NA
lgb.plot.interpretation     0.000 0.000  0.000      NA
lgb.prepare                 0.017 0.001  0.009      NA
lgb.prepare2                0.011 0.001  0.006      NA
lgb.prepare_rules           0.028 0.002  0.015      NA
lgb.prepare_rules2          0.023 0.002  0.013      NA
lgb.save                    0.000 0.000  0.000      NA
lgb.train                   0.119 0.017  0.066      NA
lgb.unloader                0.117 0.017  0.062      NA
predict.lgb.Booster         0.081 0.015  0.063      NA
readRDS.lgb.Booster         0.000 0.000  0.000      NA
saveRDS.lgb.Booster         0.000 0.000  0.000      NA
setinfo                     0.052 0.002  0.046      NA
slice                       0.054 0.003  0.048      NA
[1] "total time (s) 2.944"

To answer some questions I anticipate getting:

Should we really be using \donttest? That means CI won't catch those examples being broken

I think it's worth it to get rid of the R CMD CHECK note. We test all those functions in unit tests, so if we break the functions themselves we'll know about it.

Won't using nrounds = 5L and other parameter changes result in terrible models?

Yes probably. But we don't have strong guarantees about how well the models from our example code work anyway. I think the role of examples here is just to show minimal valid sets of arguments to each function. See #2988 (comment) for my full opinion on this.

…icrosoft#2988)

jameslamb · 2020-04-10T02:29:23Z

@Laurae2 or @guolinke could you please take a look whenever you have time? This NOTE is blocking #2985 and even if we work around it there it's going to block other PRs (since the runtime right now is so close to 5 seconds)

StrikerRUS · 2020-04-10T22:45:28Z

Should we really be using \donttest? That means CI won't catch those examples being broken

Sorry for maybe stupid question, but I'm still far away from R and learning it only by our conversations here 😃

As we do not wrap them into \dontrun (but in \donttest), it means that these examples are will be running at our RTD site and failing docs will indirectly indicate that our examples may be broken, right?

LightGBM/build_r_site.R

Lines 12 to 19 in 44a9120

    
           build_reference( 
        
               lazy = FALSE 
        
               , document = FALSE 
        
               , examples = TRUE 
        
               , run_dont_run = FALSE 
        
               , seed = 42L 
        
               , preview = FALSE 
        
           )

Laurae2

Change the learning_rate to adapt to the hyperparameter changes

R-package/R/lgb.cv.R

R-package/R/lgb.plot.interpretation.R

R-package/R/lgb.plot.importance.R

R-package/R/lgb.interprete.R

R-package/R/lgb.importance.R

Laurae2 · 2020-04-13T18:23:12Z

As we do not wrap them into \dontrun (but in \donttest), it means that these examples are will be running at our RTD site and failing docs will indirectly indicate that our examples may be broken, right?

@StrikerRUS The examples would fail silently (but still show the error in doc).

jameslamb · 2020-04-13T21:11:38Z

As we do not wrap them into \dontrun (but in \donttest), it means that these examples are will be running at our RTD site and failing docs will indirectly indicate that our examples may be broken, right?

@StrikerRUS The examples would fail silently (but still show the error in doc).

I thought that if you added \dontrun, it will literally add the comment "## don't run" in the examples which I think would be confusing. I swear it used to work like that. But I just tested and I guess not?

@Laurae2 is right (#2989 (comment)). I tried adding stop("I am an error") to the examples in lgb.cv() and then re-running pkgdown::build_site().

Results when that example is wrapped in \donttest

Results with \dontrun

In which case I think the \donttest is still an ok choice. I'd rather that we have an error message that someone sees and alerts us to than just having the code be silently broken until someone runs it. I'd hate to have people run that code, get an error, and get discouraged and think they're doing something wrong.

jameslamb · 2020-04-13T23:18:32Z

Change the learning_rate to adapt to the hyperparameter changes

Thanks! Just made these changes

StrikerRUS · 2020-04-16T02:14:02Z

Not sure that the root cause is this PR, but we have the following right now at our docs:

https://lightgbm.readthedocs.io/en/latest/R/reference/lgb.cv.html

StrikerRUS · 2020-04-16T02:20:45Z

Also, it seems that two other examples are broken:

https://lightgbm.readthedocs.io/en/latest/R/reference/lgb.Dataset.set.categorical.html
https://lightgbm.readthedocs.io/en/latest/R/reference/lgb.plot.interpretation.html

jameslamb · 2020-04-16T04:05:03Z

Thanks @StrikerRUS ! I don't think they are related to this PR. I just created #2999 to address them though, so appreciate you pointing them out!

[R-package] [docs] Simplified examles to cut example run time (fixes m…

c7cb1c9

…icrosoft#2988)

jameslamb requested review from Laurae2, guolinke and StrikerRUS April 10, 2020 02:28

jameslamb mentioned this pull request Apr 10, 2020

more cpp lints #2985

Merged

jameslamb added the maintenance label Apr 10, 2020

jameslamb changed the title ~~[R-package] [docs] Simplified examles to cut example run time (fixes #2988)~~ [R-package] [docs] Simplified examples to cut example run time (fixes #2988) Apr 13, 2020

Laurae2 suggested changes Apr 13, 2020

View reviewed changes

jameslamb added 2 commits April 13, 2020 18:14

Merge branch 'master' into misc/r-example-times

22782e5

updated learning rates

1024d9f

Laurae2 approved these changes Apr 14, 2020

View reviewed changes

StrikerRUS merged commit 1816167 into microsoft:master Apr 15, 2020

StrikerRUS added doc and removed maintenance labels Apr 15, 2020

jameslamb mentioned this pull request Apr 16, 2020

[R-package] fix R examples and lgb.plot.interpretation() #2999

Closed

jameslamb mentioned this pull request Apr 17, 2020

[R-package] fix R examples and lgb.plot.interpretation() #3002

Merged

jameslamb deleted the misc/r-example-times branch April 25, 2020 17:07

lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-package] [docs] Simplified examples to cut example run time (fixes #2988) #2989

[R-package] [docs] Simplified examples to cut example run time (fixes #2988) #2989

jameslamb commented Apr 10, 2020

jameslamb commented Apr 10, 2020

StrikerRUS commented Apr 10, 2020 •

edited

Loading

Laurae2 left a comment

Laurae2 commented Apr 13, 2020

jameslamb commented Apr 13, 2020 •

edited

Loading

jameslamb commented Apr 13, 2020

StrikerRUS commented Apr 16, 2020

StrikerRUS commented Apr 16, 2020

jameslamb commented Apr 16, 2020

[R-package] [docs] Simplified examples to cut example run time (fixes #2988) #2989

[R-package] [docs] Simplified examples to cut example run time (fixes #2988) #2989

Conversation

jameslamb commented Apr 10, 2020

jameslamb commented Apr 10, 2020

StrikerRUS commented Apr 10, 2020 • edited Loading

Laurae2 left a comment

Choose a reason for hiding this comment

Laurae2 commented Apr 13, 2020

jameslamb commented Apr 13, 2020 • edited Loading

jameslamb commented Apr 13, 2020

StrikerRUS commented Apr 16, 2020

StrikerRUS commented Apr 16, 2020

jameslamb commented Apr 16, 2020

StrikerRUS commented Apr 10, 2020 •

edited

Loading

jameslamb commented Apr 13, 2020 •

edited

Loading