This repository has been archived by the owner on Feb 16, 2020. It is now read-only.
Prevent dataset scanning from depleting memory #1970
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bugfix
If you have large datasets, with many markets imported the dataset scanner is going to fork for every single exchange and market combination all at once. This depletes memory very fast, and doesn't provide a significant gain in processing performance (can often lead to worse performance even).
AWS and Docker machines often crash due to the scanning requiring several gigs of memory.
The dataset scanner now queues and runs only as many forks as there are CPU cores on the system. Greatly reducing the impact on memory to a couple 10s of MB.