Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bound the size of the histogram cache. #9440

Merged
merged 6 commits into from
Aug 7, 2023

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Aug 4, 2023

Second attempt for #9432 . Less intrusive and smaller overhead when the size doesn't overflow.

  • Bound the size of the histogram cache. Currently, the value is 1 << 16 nodes.
  • Unify the code path between hist, multi-hist, and approx.

Close #9342
Close #9372

Aside from the raised issue, this also helps multi-target tree as memory usage can be significant when the number of targets is large.

todos:

  • Test distributed
  • unittest
  • Benchmark for both exceeding the bound and within the bound.
  • Test mixed batch for QDM and external memory.

notes

  • When I try to run tests with max_depth=20, the CPU utilization is low and consistently under 60 percent. Further profiling is needed in the future.
  • There's no performance change when the size is within bound.
  • Memory usage is bounded, and OOM can be avoided when the tree is deep. Tested with max_depth=20, without the PR, the process is killed on my 128GB memory desktop.
  • The parameter for limiting the size is internal at the moment, as we can't reuse the parameter for GPU yet.

- A new histogram collection with limit in size.
- Unify histogram building logic between hist, multi-hist, and approx.
@trivialfis trivialfis marked this pull request as ready for review August 7, 2023 07:49
@trivialfis trivialfis changed the title [WIP] Bound the size of the histogram cache. Bound the size of the histogram cache. Aug 7, 2023
@trivialfis trivialfis requested a review from hcho3 August 7, 2023 16:58
Comment on lines +90 to +91
BlockedSpace2d(std::size_t dim1, std::function<std::size_t(std::size_t)> getter_size_dim2,
std::size_t grain_size) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use std::function<> instead of a template arg? Doesn't it add an extra pointer indirection?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted the compiler to do checks for me. ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And more self-explanatory API.

@trivialfis trivialfis merged commit 54029a5 into dmlc:master Aug 7, 2023
24 checks passed
@trivialfis trivialfis deleted the new-hist-collection branch August 7, 2023 19:21
Comment on lines +140 to +141
inline void ParallelFor2d(BlockedSpace2d const& space, std::int32_t n_threads,
std::function<void(std::size_t, Range1d)> func) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using std::function in a tight loop may have a performance implication, since func is no longer inlined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used in a tight loop, it's used to create parallel loops.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fault, it's used in a tight loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But some benchmarks with higgs dataset show no difference before and after the PR. I can revert the change if necessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just revert this line. The constructor can remain the same since it's not in a tight loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants