Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable duckdb option partitioned_write_flush_threshold=1024 to reduce memory #591

Merged
merged 1 commit into from
Sep 20, 2024

Conversation

Taepper
Copy link
Collaborator

@Taepper Taepper commented Sep 20, 2024

resolves spurious OOM crashes when preprocessing large datasets

Summary

We now set the flag partitioned_write_flush_threshold to 1024, whereas its default value is 524288. Initial tests did not indicate a big slowdown of the preprocessing times.

We found this flag in a discussion of a similar issue here.

PR Checklist

- [ ] All necessary documentation has been adapted or there is an issue to do so.
- [ ] The implemented feature is covered by an appropriate test.

Copy link
Contributor

github-actions bot commented Sep 20, 2024

This is a preview of the changelog of the next release. If this branch is not up-to-date with the current main branch, the changelog may not be accurate. Rebase your branch on the main branch to get the most accurate changelog.

Note that this might contain changes that are on main, but not yet released.

Changelog:

0.2.20 (2024-09-20)

Features

  • make duckdb memory limit configurable via preprocessing config (c112eb1)

Bug Fixes

  • preprocessing: resolves spurious OOM crashes when handling large datasets (79658d5)

Copy link
Contributor

@fengelniederhammer fengelniederhammer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please adapt the commit message (with an eye on the changelog).

And maybe also put the link to that GitHub comment in the commit description? That would probably be valuable for someone trying to figure out in a year from now in the commit history, why we set that value.

… datasets

The partitioned_write_flush_threshold flag is now set to 1024 (from its default of 524288), mitigating out-of-memory errors during preprocessing. Initial performance tests showed no significant impact on preprocessing times
@Taepper Taepper merged commit 6e4eae2 into main Sep 20, 2024
10 checks passed
@Taepper Taepper deleted the duckdbExperiment branch September 20, 2024 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants