-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Free hash table after grouping set/row number spill to release memory plus a hash table fix #11180
Conversation
✅ Deploy Preview for meta-velox canceled.
|
This pull request was exported from Phabricator. Differential Revision: D63964822 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch, left some nits
@@ -742,7 +742,7 @@ bool GroupingSet::getOutput( | |||
: 0; | |||
if (numGroups == 0) { | |||
if (table_ != nullptr) { | |||
table_->clear(); | |||
table_->clear(/*freeTable=*/true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also check if HashBuild needs this change (putting true to clear() method)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more intuitive to have HashTable::clear() take true as default instead of false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used by partial aggregation. And hash build doesn't need it as we haven't build table until the final stage and probe side always clear the entire table.
… plus a hash table fix (facebookincubator#11180) Summary: Found in shadow testing that hash aggregation can use non-trivial amount of memory like a couple hundred MB after reclaim because the hash table held by grouping set. Currently we only clear the hash table in grouping set but not free the table inside (only free groups). Similar for row number operator. This PR change includes (1) free table after spill for both row number and grouping set to make memory reclamation or arbitration efficient and see significant improvement in global arbitration shadow testing. (2) free row number result vector in row number spill to have more strict test check and we assume a single vector is small and just free 1MB per operator in real workload. (3) fix free table in hash table which doesn't reset capacity and add unit test to cover Reviewed By: oerling Differential Revision: D63964822
96d1c44
to
c318839
Compare
This pull request was exported from Phabricator. Differential Revision: D63964822 |
… plus a hash table fix (facebookincubator#11180) Summary: Found in shadow testing that hash aggregation can use non-trivial amount of memory like a couple hundred MB after reclaim because the hash table held by grouping set. Currently we only clear the hash table in grouping set but not free the table inside (only free groups). Similar for row number operator. This PR change includes (1) free table after spill for both row number and grouping set to make memory reclamation or arbitration efficient and see significant improvement in global arbitration shadow testing. (2) free row number result vector in row number spill to have more strict test check and we assume a single vector is small and just free 1MB per operator in real workload. (3) fix free table in hash table which doesn't reset capacity and add unit test to cover Reviewed By: oerling Differential Revision: D63964822
c318839
to
6d15c10
Compare
This pull request was exported from Phabricator. Differential Revision: D63964822 |
void resetTable(); | ||
/// all the inputs. If 'freeTable' is false, then hash table itself is not | ||
/// freed but only table content. | ||
void resetTable(bool freeTable = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since there very few instances of its use, would it make sense to get rid of the default value?, so that the caller makes an explicit decision and future uses do not inadvertently skip freeing the table if required.
This pull request has been merged in e2231c5. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary:
Found in shadow testing that hash aggregation can use non-trivial amount of memory like a couple hundred MB
after reclaim because the hash table held by grouping set. Currently we only clear the hash table in grouping set
but not free the table inside (only free groups). Similar for row number operator.
This PR change includes
(1) free table after spill for both row number and grouping set to make memory reclamation or arbitration
efficient and see significant improvement in global arbitration shadow testing.
(2) free row number result vector in row number spill to have more strict test check and we assume a single
vector is small and just free 1MB per operator in real workload.
(3) fix free table in hash table which doesn't reset capacity and add unit test to cover
Differential Revision: D63964822