Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Change locking strategy of Booster, allow for share and unique locks #2760

Merged
merged 55 commits into from
Jul 19, 2020

Conversation

JoanFM
Copy link
Contributor

@JoanFM JoanFM commented Feb 14, 2020

Changes introduced
I propose to have a different locking strategy that would allow to do singlePredictions in a parallel way.
I propose having shared lock for const methods and unique locks for non const that are potentially writing values.
I added const to the methods that do Prediction since they should not alter the Booster and should be able to access the mutex in shared mode.

I hope this can help.

JoanFM and others added 28 commits February 3, 2020 14:51
… to upper and lower bound, move implementation to gdbt.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>
@imatiach-msft
Copy link
Contributor

@JoanFM agreed, I think it's ok to not use git submodules for now, especially if they are not updating their code frequently. However, long term I always try to reduce code duplication/forking since it increases maintenance costs. However, it sounds like this is not something we want to do right now.

@JoanFM
Copy link
Contributor Author

JoanFM commented Jul 16, 2020

Hello @StrikerRUS @guolinke, after solving a couple of conflicts I have issues again with the r-package.

@JoanFM JoanFM closed this Jul 16, 2020
@JoanFM JoanFM reopened this Jul 16, 2020
@guolinke
Copy link
Collaborator

ping @jameslamb for R's CI.

@StrikerRUS
Copy link
Collaborator

Actually, all tests are failing, not just R ones.

I believe the problem is in conflicts after #2992 was merged. For example, see compilation errors:

[ 96%] Building CXX object CMakeFiles/_lightgbm.dir/src/c_api.cpp.o
/home/travis/build/microsoft/LightGBM/src/c_api.cpp:1982:86: error: too many
      arguments to function call, expected 6, have 7
  ...get_row_fun, fastConfig->config, out_result, out_len);
                                                  ^~~~~~~
/home/travis/build/microsoft/LightGBM/src/c_api.cpp:381:3: note: 
      'PredictSingleRow' declared here
  void PredictSingleRow(int predict_type, int ncol,
  ^
/home/travis/build/microsoft/LightGBM/src/c_api.cpp:2116:53: error: too many
      arguments to function call, expected 6, have 7
                                        out_result, out_len);
                                                    ^~~~~~~
/home/travis/build/microsoft/LightGBM/src/c_api.cpp:381:3: note: 
      'PredictSingleRow' declared here
  void PredictSingleRow(int predict_type, int ncol,
  ^
2 errors generated.
CMakeFiles/_lightgbm.dir/build.make:439: recipe for target 'CMakeFiles/_lightgbm.dir/src/c_api.cpp.o' failed

@JoanFM
Copy link
Contributor Author

JoanFM commented Jul 16, 2020

Actually, all tests are failing, not just R ones.

I believe the problem is in conflicts after #2992 was merged. For example, see compilation errors:

[ 96%] Building CXX object CMakeFiles/_lightgbm.dir/src/c_api.cpp.o
/home/travis/build/microsoft/LightGBM/src/c_api.cpp:1982:86: error: too many
      arguments to function call, expected 6, have 7
  ...get_row_fun, fastConfig->config, out_result, out_len);
                                                  ^~~~~~~
/home/travis/build/microsoft/LightGBM/src/c_api.cpp:381:3: note: 
      'PredictSingleRow' declared here
  void PredictSingleRow(int predict_type, int ncol,
  ^
/home/travis/build/microsoft/LightGBM/src/c_api.cpp:2116:53: error: too many
      arguments to function call, expected 6, have 7
                                        out_result, out_len);
                                                    ^~~~~~~
/home/travis/build/microsoft/LightGBM/src/c_api.cpp:381:3: note: 
      'PredictSingleRow' declared here
  void PredictSingleRow(int predict_type, int ncol,
  ^
2 errors generated.
CMakeFiles/_lightgbm.dir/build.make:439: recipe for target 'CMakeFiles/_lightgbm.dir/src/c_api.cpp.o' failed

Thanks @StrikerRUS ,

Sorry, my bad, I did the conflict resolution on the browser and did not double check. I solved the issues now, but now there are some unused parameters, but I guess removing them would break a lot of the interfaces.

@JoanFM
Copy link
Contributor Author

JoanFM commented Jul 17, 2020

Hi @StrikerRUS @guolinke @imatiach-msft, is there any issue blocking the merge of this PR? Thanks

@guolinke
Copy link
Collaborator

Thanks @JoanFM so much, looks good to me. ping @StrikerRUS for the final review.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jul 17, 2020

Everything looks good to me either! Actually, I gave my approval a few days ago: #2760 (review). Sorry for the confusion!

I just wonder, should new functions in C API from the recently merged #2992 be enhanced with new locks?

@JoanFM
Copy link
Contributor Author

JoanFM commented Jul 18, 2020

Hey @StrikerRUS, I think he did not add any new method inside the Booster object, am I right? If it is so, that PR should directly benefit from the new locking strategy. Buy yes it is important to keep up with these locks when new methods arise, and if possible ensure const-correctness for new methods.

@StrikerRUS
Copy link
Collaborator

@JoanFM Yeah, seems you are right!

Thanks a lot for your contribution and a great patience!

@Ten0
Copy link
Contributor

Ten0 commented Aug 17, 2023

For reference: yamc seems to be a pretty bad implementation with regards to the way it manages shared locks: it relies on an exclusive mutex on a state variable to implement the shared locking.
This seems to be far from being on par with regular rwlock implementations. (e.g. gcc's libstdc++17 using glibc using pthread...)

@jameslamb
Copy link
Collaborator

yamc seems to be a pretty bad implementation with regards to the way it manages shared locks

@Ten0 are you interested in submitting a pull request changing the behavior you're talking about? Or writing up in more detail a proposal for some change that someone else could implement?

We'd welcome your help in improving LightGBM if you see some opportunity for improvement.

@Ten0
Copy link
Contributor

Ten0 commented Aug 17, 2023

Are you interested in submitting a pull request changing the behavior you're talking about?

Hello! Thank you for your interest!
Indeed I might be! I'm currently working towards a benchmark for #6024 (which should be feature-complete otherwise), and I'll make sure to specifically benchmark the impact of the sub-optimal locking. If it is large, I may work on this next. However if it turns out that running the decision tree is expensive enough that the sub-optimal locking becomes negligible and allowing parallelism from #6024 is enough, I probably won't prioritize it.

@Ten0
Copy link
Contributor

Ten0 commented Sep 18, 2023

I've finished the benchmarks and as it turns out, running our decision tree is expensive enough that the sub-optimal locking becomes negligible and allowing parallelism from #6024 is totally enough for us ATM, so I won't dedicate time to this.
image

Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants