Make sure to only use provenance copies when using `product.wasderivedfrom()` #1984

schlunma · 2023-03-22T16:03:23Z

Description

This PR makes sure to only use provenance copies (product.copy_provenance()) as input for product.wasderivedfrom(). This prevents recursively adding large amounts of ancestors which blows up memory usage and run time.

Closes #1981

Before you get started

☝ Create an issue to discuss what you are going to do

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

🧪 The new functionality is relevant and scientifically sound
🛠 This pull request has a descriptive title and labels
🛠 Code is written according to the code quality guidelines
🧪 and 🛠 Documentation is available
🛠 Unit tests have been added
🛠 Changes are backward compatible
🛠 Any changed dependencies have been added or removed correctly
🛠 The list of authors is up to date
🛠 All checks below this pull request were successful

To help with the number pull requests:

🙏 We kindly ask you to review two other open pull requests in this repository

codecov · 2023-03-22T16:26:01Z

Codecov Report

Merging #1984 (491f7cb) into main (9406b7e) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1984   +/-   ##
=======================================
  Coverage   92.80%   92.80%           
=======================================
  Files         236      236           
  Lines       12445    12446    +1     
=======================================
+ Hits        11549    11550    +1     
  Misses        896      896

Impacted Files	Coverage Δ
esmvalcore/preprocessor/_mask.py	`86.40% <100.00%> (+0.05%)`	⬆️

valeriupredoi · 2023-03-27T13:23:13Z

@schlunma cheers, Manu! I have just tested this with your example recipe in #1981 - varying the number of models leads to a very tiny increase in total max memory, this time round barely linear, if at all, with data, not to Powah 4 😁 ie 0.9G for 14 models that were otherwise needing 16G 🍺

bouweandela · 2023-03-30T10:02:55Z

This could be implemented more efficiently by making fewer copies. For example, this implementation seems to copy all input file provenance for all output files, leading to n x (n-1) copies for the multi dataset mask (assuming all datasets are masked), where n is the number of datasets. However, only the input file provenance needs to be copied, which would be just n copies.

valeriupredoi · 2023-03-30T10:26:00Z

@bouweandela would you be able to suggest a code snippet for @schlunma to use and expand on pls 🍺 I am not 100% sure of the severity of memory issue ie how many particular cases are affected, so I'd feel more comfortable if this was plugged in, even if not totally efficient, definitely waay more effeicient in terms of memory consumption compared to what we have now 😁

schlunma · 2023-03-30T10:43:45Z

This could be implemented more efficiently by making fewer copies. For example, this implementation seems to copy all input file provenance for all output files, leading to n x (n-1) copies for the multi dataset mask (assuming all datasets are masked), where n is the number of datasets. However, only the input file provenance needs to be copied, which would be just n copies.

Makes sense, though the performance gain is probably not noticeable for common recipes. Implemented in 341c650.

valeriupredoi · 2023-03-30T11:02:17Z

good work, gents! @bouweandela could you maybe pls check the new commits, review, and merge this then 🍺

valeriupredoi · 2023-03-31T12:46:59Z

@bouweandela frenly yodeling ping

valeriupredoi · 2023-04-17T12:15:55Z

@bouweandela one for you here, bud - quick one, get you back to work after the Easter break 😁

esmvalcore/preprocessor/_bias.py

esmvalcore/preprocessor/_multimodel.py

bouweandela

Looking good now, thanks @schlunma!

Make sure to use copy_provenance before wasderivedfrom

ee6779a

schlunma added the enhancement New feature or request label Mar 22, 2023

schlunma added this to the v2.9.0 milestone Mar 22, 2023

schlunma self-assigned this Mar 22, 2023

valeriupredoi approved these changes Mar 27, 2023

View reviewed changes

schlunma added 2 commits March 30, 2023 12:31

Avoid calling copy_provenance n(n-1) times

341c650

Avoid creating too many provenance copies in MM stats preproc

2ecbb5c

bouweandela reviewed Apr 18, 2023

View reviewed changes

esmvalcore/preprocessor/_bias.py Outdated Show resolved Hide resolved

bouweandela reviewed Apr 18, 2023

View reviewed changes

esmvalcore/preprocessor/_multimodel.py Outdated Show resolved Hide resolved

schlunma added 2 commits May 2, 2023 10:04

Merge remote-tracking branch 'origin/main' into fix_wasderivedfrom

721c451

Undo changes in bias and mm preprocessors

491f7cb

bouweandela approved these changes May 2, 2023

View reviewed changes

bouweandela merged commit 6be36f7 into main May 2, 2023

bouweandela deleted the fix_wasderivedfrom branch May 2, 2023 09:59

bouweandela added bug Something isn't working preprocessor Related to the preprocessor and removed enhancement New feature or request labels May 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure to only use provenance copies when using `product.wasderivedfrom()` #1984

Make sure to only use provenance copies when using `product.wasderivedfrom()` #1984

schlunma commented Mar 22, 2023 •

edited

Loading

codecov bot commented Mar 22, 2023 •

edited

Loading

valeriupredoi commented Mar 27, 2023

bouweandela commented Mar 30, 2023

valeriupredoi commented Mar 30, 2023

schlunma commented Mar 30, 2023

valeriupredoi commented Mar 30, 2023

valeriupredoi commented Mar 31, 2023

valeriupredoi commented Apr 17, 2023

bouweandela left a comment

Make sure to only use provenance copies when using product.wasderivedfrom() #1984

Make sure to only use provenance copies when using product.wasderivedfrom() #1984

Conversation

schlunma commented Mar 22, 2023 • edited Loading

Description

Before you get started

Checklist

codecov bot commented Mar 22, 2023 • edited Loading

Codecov Report

valeriupredoi commented Mar 27, 2023

bouweandela commented Mar 30, 2023

valeriupredoi commented Mar 30, 2023

schlunma commented Mar 30, 2023

valeriupredoi commented Mar 30, 2023

valeriupredoi commented Mar 31, 2023

valeriupredoi commented Apr 17, 2023

bouweandela left a comment

Choose a reason for hiding this comment

Make sure to only use provenance copies when using `product.wasderivedfrom()` #1984

Make sure to only use provenance copies when using `product.wasderivedfrom()` #1984

schlunma commented Mar 22, 2023 •

edited

Loading

codecov bot commented Mar 22, 2023 •

edited

Loading