-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure to only use provenance copies when using product.wasderivedfrom()
#1984
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1984 +/- ##
=======================================
Coverage 92.80% 92.80%
=======================================
Files 236 236
Lines 12445 12446 +1
=======================================
+ Hits 11549 11550 +1
Misses 896 896
|
This could be implemented more efficiently by making fewer copies. For example, this implementation seems to copy all input file provenance for all output files, leading to n x (n-1) copies for the multi dataset mask (assuming all datasets are masked), where n is the number of datasets. However, only the input file provenance needs to be copied, which would be just n copies. |
@bouweandela would you be able to suggest a code snippet for @schlunma to use and expand on pls 🍺 I am not 100% sure of the severity of memory issue ie how many particular cases are affected, so I'd feel more comfortable if this was plugged in, even if not totally efficient, definitely waay more effeicient in terms of memory consumption compared to what we have now 😁 |
Makes sense, though the performance gain is probably not noticeable for common recipes. Implemented in 341c650. |
good work, gents! @bouweandela could you maybe pls check the new commits, review, and merge this then 🍺 |
@bouweandela frenly yodeling ping |
@bouweandela one for you here, bud - quick one, get you back to work after the Easter break 😁 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good now, thanks @schlunma!
Description
This PR makes sure to only use provenance copies (
product.copy_provenance()
) as input forproduct.wasderivedfrom()
. This prevents recursively adding large amounts of ancestors which blows up memory usage and run time.Closes #1981
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: