[R-package] Quick question about num of thread #4192

issactoast · 2021-04-17T21:45:41Z

Hi, Thank you for making lightGBM in R!

I am using LightGBM in R and have a quick question about the num_thread.

According to the manual, the number of threads is the physical core of CPU. But usually what I have seen in my R code, set the num thread is equal to num_thread - 1 such as

cores <- parallel::detectCores() -1
cores

So if we have 4 core, use 3 for parallel and 1 for the controller. Is this applied to LightGBM too? So if I have 4 physical core of CPU and set the number of thread as 3?

The text was updated successfully, but these errors were encountered:

jameslamb · 2021-04-18T04:35:52Z

Thanks for using LightGBM!

Assuming that there are not other processes making heavy use of the available CPUs, you will get the best performance by setting num_thread equal to the total number of real CPU cores. There is not really a concept of a "controller" in parallel training with LightGBM.

You could use something like this to test the relative speedup from different settings of num_threads:

library(lightgbm)
library(microbenchmark)
library(nycflights13)

data(flights, package = "nycflights13")
flights <- as.data.frame(flights)

dtrain <- lgb.Dataset(
    as.matrix(
        flights[, c("year", "sched_dep_time", "distance", "hour", "minute")]
    )
    , label = flights[, "dep_delay"]
    , free_raw_data = FALSE
    , max_bin = 350
)

num_cores <- parallel::detectCores()

for (num_thread in c(num_cores - 1, num_cores)) {
    print(paste0("num_thread: ", num_thread))
    print(
        microbenchmark::microbenchmark({
            lgb.train(
                params = list(
                    num_thread = num_thread
                    , objective = "regression_l2"
                    , num_leaves = 31L
                    , max_depth = 8L
                    , learning_rate = 0.01
                    , min_data_in_leaf = 1
                )
                , data = dtrain
                , nrounds = 1000L
                , verbose = -1L
            )
        }, times = 5, unit = "s")
    )
}

I installed {lightgbm} 3.2.1 on my Mac tonight with install.packages("lightgbm", type = "source"), and with that version I got the following results from the code above. I have two 2-core CPUs on this machine.

# num_threads = 3

     min       lq     mean   median       uq    max neval
 5.23764 5.273885 5.497161 5.331118 5.625063 6.0181     5

# num_threads = 4
      min       lq     mean   median       uq      max neval
 4.582549 4.762857 4.907298 4.837003 4.925407 5.428671     5

Your specific results will vary based on your specific dataset and the other learning parameter values you set.

issactoast · 2021-04-18T16:58:02Z

Thank you for the clarification! Just want to make it clear, we need to use num_cores <- parallel::detectCores(logical=FALSE) so that the num_cores is equal to the physical core. I used the same code that you have and it confirms that setting the num_thread = num of physical core is faster. Thank you!

library(lightgbm)
#> Loading required package: R6
library(microbenchmark)
library(nycflights13)

data(flights, package = "nycflights13")
flights <- as.data.frame(flights)

dtrain <- lgb.Dataset(
    as.matrix(
        flights[, c("year", "sched_dep_time", "distance", "hour", "minute")]
    )
    , label = flights[, "dep_delay"]
    , free_raw_data = FALSE
    , max_bin = 350
)

num_cores <- parallel::detectCores(logical = FALSE)

for (num_thread in c(num_cores, num_cores * 2)) {
    print(paste0("num_thread: ", num_thread))
    print(
        microbenchmark::microbenchmark({
            lgb.train(
                params = list(
                    num_thread = num_thread
                    , objective = "regression_l2"
                    , num_leaves = 31L
                    , max_depth = 8L
                    , learning_rate = 0.01
                    , min_data_in_leaf = 1
                )
                , data = dtrain
                , nrounds = 1000L
                , verbose = -1L
            )
        }, times = 5, unit = "s")
    )
}
#> [1] "num_thread: 10"
#> Unit: seconds
#>                                                                                                                                                                                                                                            expr
#>  {     lgb.train(params = list(num_thread = num_thread, objective = "regression_l2",          num_leaves = 31L, max_depth = 8L, learning_rate = 0.01,          min_data_in_leaf = 1), data = dtrain, nrounds = 1000L,          verbose = -1L) }
#>       min       lq     mean   median       uq      max neval
#>  2.522868 2.546463 2.593207 2.563418 2.646491 2.686795     5
#> [1] "num_thread: 20"
#> Unit: seconds
#>                                                                                                                                                                                                                                            expr
#>  {     lgb.train(params = list(num_thread = num_thread, objective = "regression_l2",          num_leaves = 31L, max_depth = 8L, learning_rate = 0.01,          min_data_in_leaf = 1), data = dtrain, nrounds = 1000L,          verbose = -1L) }
#>       min       lq    mean   median       uq      max neval
#>  4.536631 4.673258 4.67483 4.682869 4.687078 4.794311     5

^{Created on 2021-04-18 by the reprex package (v1.0.0)}

jameslamb · 2021-04-18T17:08:50Z

Ah yes, you are absolutely right! Are you interested in contributing a change to the documentation? I think others would benefit from that note.

It would just be updating

LightGBM/R-package/R/lightgbm.R

Lines 95 to 97 in 7ea2bc4

    
           #'          \item{\code{num_threads}: Number of threads for LightGBM. For the best speed, set this to 
        
           #'                             the number of real CPU cores, not the number of threads (most 
        
           #'                             CPU using hyper-threading to generate 2 threads per CPU core).}

to say something like

the number of real CPU cores (\code{parallel::detectCores(logical = FALSE)})

And then re-generating the documentation files with commands like this:

sh build-cran-package.sh
R CMD INSTALL --with-keep.source lightgbm_*.tar.gz
cd R-package
Rscript -e "roxygen2::roxygenize(load = 'installed')"

issactoast · 2021-04-18T22:43:09Z

@jameslamb Sure! I will do that, thanks!

jameslamb · 2021-04-18T22:44:09Z

Great, thanks so much! Let me know if you run into any issues.

github-actions · 2023-08-23T14:45:43Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb added question r-package labels Apr 18, 2021

jameslamb changed the title ~~Quick question about num of thread~~ [R-package] Quick question about num of thread Apr 18, 2021

issactoast closed this as completed Apr 18, 2021

jameslamb reopened this Apr 18, 2021

jameslamb mentioned this issue Apr 18, 2021

[R-package] Add function to generate a list of parameters #4195

Closed

jameslamb assigned issactoast Apr 18, 2021

issactoast mentioned this issue Apr 19, 2021

[docs][R-package] Update the explanation of num_threads (fixes #4192) #4199

Merged

StrikerRUS closed this as completed in e12409f Apr 19, 2021

jameslamb mentioned this issue Jul 2, 2021

Question: why setting num_threads = cpu cores is better #4425

Closed

jameslamb mentioned this issue Mar 1, 2022

MinGW fails with access violation writing error #1818

Closed

jameslamb mentioned this issue Mar 28, 2022

[R-package] Promote number of threads to top-level argument in lightgbm() and change default to number of cores #4972

Merged

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-package] Quick question about num of thread #4192

[R-package] Quick question about num of thread #4192

issactoast commented Apr 17, 2021

jameslamb commented Apr 18, 2021

issactoast commented Apr 18, 2021

jameslamb commented Apr 18, 2021

issactoast commented Apr 18, 2021

jameslamb commented Apr 18, 2021

github-actions bot commented Aug 23, 2023

[R-package] Quick question about num of thread #4192

[R-package] Quick question about num of thread #4192

Comments

issactoast commented Apr 17, 2021

jameslamb commented Apr 18, 2021

issactoast commented Apr 18, 2021

jameslamb commented Apr 18, 2021

issactoast commented Apr 18, 2021

jameslamb commented Apr 18, 2021

github-actions bot commented Aug 23, 2023