Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lgb.cv data.table error - R package #2715

Closed
abowma opened this issue Jan 28, 2020 · 13 comments
Closed

lgb.cv data.table error - R package #2715

abowma opened this issue Jan 28, 2020 · 13 comments

Comments

@abowma
Copy link

abowma commented Jan 28, 2020

Hi,

I have been running LightGBM 2.3.1 in R (version 3.5) for the past few months in a linux environment with no issues. My colleague has installed the package in the past few days, however when running the example code provided on the git page, they experience an error (whereas it works on my install):

Error message

Error in data.table::data.table(indices = test_indices, weight = getinfo(data, : column or argument 2 is NULL

Reproducible Code

  data(agaricus.train, package='lightgbm')
  train <- agaricus.train
  dtrain <- lgb.Dataset(train$data, label=train$label)
  params <- list(objective="regression", metric="l2")
  model <- lgb.cv(params, dtrain, 10, nfold=5, min_data=1, learning_rate=1, 
  early_stopping_rounds=10)

Appreciate any help with this!

@jameslamb
Copy link
Collaborator

Thanks for the report @abowma ! I will look into it tonight and get back to you, just want you to know we are on it.

Can you ask your colleague to try again from the current version on master? We just merged this to lgb.cv() (#2573 ) recently, wondering if that fixed it....or caused it 😬

@abowma
Copy link
Author

abowma commented Jan 29, 2020

Hi @jameslamb, thanks for the update and for getting back to me!

We had another colleague try yesterday with the same issue before I reported this, and it looks like the merge done for 2573 was 14 days ago so this would all be post-merge.

@jameslamb
Copy link
Collaborator

Hi @abowma , I just built the latest version on master on my Mac:

export CXX=/usr/local/bin/g++-8
export  CC=/usr/local/bin/gcc-8
Rscript build_r.R

and then ran the code you provided (#2715 (comment)). For me, the code ran successfully and did not throw any error.

So I think I need more information about the environment where you are seeing the issues.

Could you tell me some information about the environment where this issue is showing up?

  • how you are installing the lightgbm R package
  • the output of running git log -n 5
  • R version
  • operating system + version
  • LightGBM version

@abowma
Copy link
Author

abowma commented Jan 30, 2020

Hi @jameslamb

Here are the details of the environment:

  • Installed using
lgb.dl(commit = "master",
       compiler = "gcc",
       repo = "https://github.com/microsoft/LightGBM")
  • Due to the install method the cloned repo is removed and so am unable to run git log -n 5
  • Using RStudio Server, with R version 3.5.0 on Red Hat Enterprise Linux Server 7.5
  • The LightGBM version where the issue is occurring is 2.3.2

@abowma
Copy link
Author

abowma commented Jan 31, 2020

@jameslamb - was testing on another server we have and it seems like the issue may be due to the version of R. The issue didn't occur with R 3.6.0, but seems to on R 3.5.0. Though again unsure of the reason.

@jameslamb
Copy link
Collaborator

@jameslamb - was testing on another server we have and it seems like the issue may be due to the version of R. The issue didn't occur with R 3.6.0, but seems to on R 3.5.0. Though again unsure of the reason.

Ok this is good information, thank you. I'll test on R3.5 and see if I can reproduce the issue. We only use R 3.6.x in CI so it is possible there's a 3.5.x-specific issue that hasn't been caught.

We're also working through #2714 , another issue where the user is installing with lgb.dl(), so I'll investigate whether we've done something here that has broken the installation process using that function from that project.

@jameslamb
Copy link
Collaborator

@abowma sorry for the delay in response! I tested your sample code on R 3.5 tonight and couldn't reproduce the issue. (see #2787 for how to do this yourself).

Next, I'm going to try installing lightgbm with lgb.dl().

Another theory I have is that the difference is due to data.table versions...could you share the output of running sessionInfo()?

Thanks!

@jameslamb
Copy link
Collaborator

I just tried this on R 3.5.3, with data.table 1.12.2 and data.table 1.12.8 (the latest).

I also tried removing lightgbm installed from source with remove.packages('lightgbm') and then installing it the way you mentioned:

lgbdl::lgb.dl(commit = "master",
       compiler = "gcc",
       repo = "https://github.com/microsoft/LightGBM")

In all these different configurations, the code from #2715 (comment) ran successfully and I could not reproduce the issue.

The next thing I'm going to try is literally 3.5.0 instead of 3.5.3, since you mentioned that you had that exact version (I should have just done that from the beginning).

@jameslamb
Copy link
Collaborator

Hey guess what! I was able to reproduce the issue!

Full steps to reproduce:

1. build the R docker container with R 3.5.0

docker build \
    -t lightgbm-r-35 \
    -f dockerfile-r \
    --build-arg R_VERSION=3.5 \
    .

docker run -it lightgbm-r-35 /bin/bash

R

2. remove lightgbm installed in there and replace with the one created by lgb.dl()

remove.packages('lightgbm')

devtools::install_github("Laurae2/lgbdl")
lgbdl::lgb.dl(commit = "master"
    , compiler = "vs"
    , repo = "https://github.com/microsoft/LightGBM"
)

# exit the session so step 3  is in a clean session
q()

3. Run the example code

R
library(lightgbm)
 data(agaricus.train, package='lightgbm')
  train <- agaricus.train
  dtrain <- lgb.Dataset(train$data, label=train$label)
  params <- list(objective="regression", metric="l2")
  model <- lgb.cv(params, dtrain, 10, nfold=5, min_data=1, learning_rate=1, 
  early_stopping_rounds=10)

This yields the error you reported:

Error in data.table::data.table(indices = test_indices, weight = getinfo(data, :
column or argument 2 is NULL

This is with data.table 1.11.4 (the version that is installed in rocker/verse:3.5.0. I updated to 1.12.8 (the latest) by running install.packages('data.table'. repos = 'http://cran.rstudio.com').

Once I did that, the code above worked!! Could you please ask your colleague to update their data.table version and confirm that that fixed it?

@jameslamb
Copy link
Collaborator

To add more context to this...I just tried with R 3.6.0, building from source with Rscript build_r.R, and data.table 1.11.4 and didn't have any issues. So it's not like lightgbm is incompatible with data.table 1.11.4.

But I get that same error you reported with the combination of R3.6.0, building with lgb.dl(), and data.table 1.11.4.

So while I don't understand the root cause yet, the take-away here is:

If you use lgbd.dl() to build from source, you need to upgrade data.table to at least 1.12.x.

@abowma
Copy link
Author

abowma commented Feb 24, 2020

@jameslamb Thanks for the update! Tried updating the data.table version and installed in the same way and it seemed to solve the issue. On our end it seems we had the newer version of data.table installed on our server with R3.6.0 which is why the error did not occur there and caused us to think it had to do with the R version. I will keep this in mind for any future installations! Appreciate all of your help with this!

@StrikerRUS
Copy link
Collaborator

@jameslamb Can you please help to update R README with that warning about data.table version incompatibility? And then I think we may close the issue.

@jameslamb
Copy link
Collaborator

@abowma great! Glad it is working for you.

@StrikerRUS yep I'll do that right now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants