Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update versions of LightGBM dependencies #4935

Merged
merged 34 commits into from
Feb 13, 2022
Merged

Update versions of LightGBM dependencies #4935

merged 34 commits into from
Feb 13, 2022

Conversation

StrikerRUS
Copy link
Collaborator

@StrikerRUS StrikerRUS commented Jan 7, 2022

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Jan 7, 2022

All failing CI jobs are R-package jobs and the error is about hidden files:

* checking for hidden files and directories ... NOTE
Found the following hidden files and directories:
  src/external_libs/fmt/support/bazel/.bazelrc
  src/external_libs/fmt/support/bazel/.bazelversion
These were most likely included in error. See section ‘Package
structure’ in the ‘Writing R Extensions’ manual.

CRAN-pack does not know about
  src/external_libs/fmt/support/bazel/.bazelrc
  src/external_libs/fmt/support/bazel/.bazelversion

@jameslamb
Copy link
Collaborator

error is about hidden files

Adding additional rules in https://github.com/microsoft/LightGBM/blob/master/R-package/.Rbuildignore should fix that.

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Jan 7, 2022

Here is a description for them in the original PR where they were introduced:

The .bazelrc is kind of optional. It could simply be deleted. In its current form, it just avoids that some temporary Bazel-related folders are created.

The .bazelversion file can also be deleted. It is just a file for Bazelisk and a nice way to document which Bazel version was aimed for/tested with (I am pretty sure that it works with almost every Bazel version).

fmtlib/fmt#2505 (comment)

According to this, I agree that we can safely ignore/remove them.

@StrikerRUS
Copy link
Collaborator Author

Linux_latest regular job is failing (probably segfault):

../tests/python_package_test/test_dual.py s                              [ 64%]
../tests/python_package_test/test_engine.py .....................s...s.. [ 68%]
..................................../__w/1/s/.ci/test.sh: line 212:  7869 Killed                  pytest $BUILD_DIRECTORY/tests
##[error]Bash exited with code '255'.

https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=11943&view=logs&j=275189f9-c769-596a-7ef9-49fb48a9ab70&t=3a9e7a4a-04e6-52a0-67ea-6e8f6cfda74f
Need to identify the failing testcase and try to reproduce the error.

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Jan 11, 2022

The latest commit passes all CI checks. I'd like to run CI multiple times to identify flaky test and affected environments.

  1. ✔️
  2. ✔️
  3. ✔️
  4. Linux_latest sdist: ../tests/python_package_test/test_engine.py::test_model_size /__w/1/s/.ci/test.sh: line 155: 7882 Killed
  5. ✔️
  6. ✔️
  7. Linux_latest regular: ../tests/python_package_test/test_engine.py::test_model_size /__w/1/s/.ci/test.sh: line 212: 7874 Killed
  8. ✔️
  9. ✔️
  10. ✔️

@StrikerRUS StrikerRUS closed this Jan 11, 2022
@StrikerRUS StrikerRUS reopened this Jan 11, 2022
@StrikerRUS StrikerRUS closed this Jan 11, 2022
@StrikerRUS StrikerRUS reopened this Jan 11, 2022
@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Jan 11, 2022

Seems that failing test is test_model_size

def test_model_size():

and I guess that segfault happens here
bst.model_from_string(new_model_str)

// Use fast_double_parse and strtod (if parse failed) to parse double.
inline static const char* AtofPrecise(const char* p, double* out) {

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Jan 12, 2022

Reverting fast_double_parser update for now.

  1. ✔️
  2. ✔️
  3. ✔️
  4. ✔️
  5. ✔️
  6. ✔️
  7. ✔️
  8. ✔️
  9. ✔️
  10. ✔️
  11. ✔️
  12. ✔️
  13. ✔️ (one job times out due to Linux CI jobs hang forever after completing all Python tests successfully #4948 and re-runs don't help)
  14. ✔️
  15. ✔️

Swapped compilers

  1. ❌ (only Linux GPU job with Clang compiler failed, expected according to SEGFAULT on GPU version compiled with Clang #3475)
  2. ❌ (only Linux GPU job with Clang compiler failed, expected according to SEGFAULT on GPU version compiled with Clang #3475)
  3. ❌ (only Linux GPU job with Clang compiler failed, expected according to SEGFAULT on GPU version compiled with Clang #3475)
  4. ❌ (only Linux GPU job with Clang compiler failed, expected according to SEGFAULT on GPU version compiled with Clang #3475)
  5. ❌ (only Linux GPU job with Clang compiler failed, expected according to SEGFAULT on GPU version compiled with Clang #3475)

@StrikerRUS
Copy link
Collaborator Author

I think this PR is ready for review.

Based on 15 successful CI runs we can hope that this dependencies version upgrade is quite stable. I triggered all 15 runs via with empty commits because I noticed that sometimes if you re-run CI via closing-reopening a PR or by manually clicking rerun UI button (for those CIs where it's possible) new re-triggered CI job may reuse a previous runner, which breaks independency of tests.

@StrikerRUS StrikerRUS marked this pull request as ready for review January 17, 2022 18:39
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for the thorough testing and explanation about fast_double_parser.

@StrikerRUS StrikerRUS changed the title Update versions of LightGBM dependencies [WIP] Update versions of LightGBM dependencies Jan 18, 2022
@StrikerRUS
Copy link
Collaborator Author

Just marked this PR as WIP because I'd like to merge #4953 first. Also, I forgot to run tests with swapped compilers at CI. Sorry for the inconvenience!

@StrikerRUS StrikerRUS changed the title [WIP] Update versions of LightGBM dependencies Update versions of LightGBM dependencies Feb 13, 2022
@StrikerRUS
Copy link
Collaborator Author

Well, I think this PR is ready for review.
#4935 (comment)

@jameslamb
Copy link
Collaborator

Thanks for the very thorough testing! I support merging this.

Pretty exciting, looks like there have been a lot of improvements in Eigen from the version currently shipped with LightGBM to now. For example, https://eigen.tuxfamily.org/index.php?title=3.4.

@StrikerRUS StrikerRUS merged commit 15266f1 into master Feb 13, 2022
@StrikerRUS StrikerRUS deleted the deps_update branch February 13, 2022 21:09
@jameslamb jameslamb mentioned this pull request Oct 7, 2022
40 tasks
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants