Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of new climatology config #1356

Draft
wants to merge 7 commits into
base: main-dev
Choose a base branch
from
Draft

First draft of new climatology config #1356

wants to merge 7 commits into from

Conversation

dulte
Copy link
Collaborator

@dulte dulte commented Sep 30, 2024

Change Summary

Adds new way of defining how the climatology is defined. Instead of just a bool to turn climatology off and on, a climatology config can now be passed into the colocation setup.

This is a reimplementation of @Ovewh 's PR #1135. As pyaercom has changed as much as it has, it was easier to do everything from scratch.

Related issue number

Aims to fix #1125

Checklist

  • Start with a draft-PR
  • The PR title is a good summary of the changes
  • PR is set to AeroTools and a tentative milestone
  • Documentation reflects the changes where applicable
  • Tests for the changes exist where applicable
  • Tests pass locally
  • Tests pass on CI
  • At least 1 reviewer is selected
  • Make PR ready to review

@dulte dulte self-assigned this Sep 30, 2024
@dulte
Copy link
Collaborator Author

dulte commented Sep 30, 2024

Two tests fail due to different mean obs than expected with new climatology.

@dulte dulte added bug 🐛 Something isn't working enhancement ✨ New feature or request labels Sep 30, 2024
@dulte dulte added this to the m2024-11 milestone Sep 30, 2024
@Ovewh
Copy link
Collaborator

Ovewh commented Sep 30, 2024

Two tests fail due to different mean obs than expected with new climatology.

I remember climatology previously was hard coded, in the pyaerocom const to be defined as the mean between 2005- 2015, so if you use different period it might cause the test failing.

The calc_climatology function in helpers.py will set the year of the climatology unless defined (the set_year keyword), to be the middle of the the period which means that unless the year of the model data you want to compare is also that year the collocation will fail. This also causes issues if you like extend the period for which the climatology is defined i.e. from default of 2005-2015 to being 2005 - 2020. Then the time dimension of the gridded data object also would have to update if the collocation is to succeed.


resample_how: str = const.CLIM_RESAMPLE_HOW
freq: str = const.CLIM_FREQ
mincount: dict = const.CLIM_MIN_COUNT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think set_year should also be an attribute here, or if you have a better solution?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done

stop=use_climatology_ref.stop,
clim_mincount=use_climatology_ref.mincount,
resample_how=use_climatology_ref.resample_how,
clim_freq=use_climatology_ref.freq,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here set_year should be set

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +771 to +774
if isinstance(use_climatology_ref, ClimatologyConfig): # pragma: no cover
col_freq = "monthly" # use_climatology_ref.freq
obs_start = use_climatology_ref.start
obs_stop = use_climatology_ref.stop
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an old hack or something(?). Default const.CLIM_FREQ is daily, and is thus used everywhere climatology is calculated. But the col freq is set as monthly if climatology is used. Why?

And changing col_freq in the colocate_gridded_ungridded, is that smart? Then the value from colocation config is no longer immutable(?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like we talked about, we should validate this asap, in the colocation_setup, so that no data is read before this change.

Copy link

codecov bot commented Oct 11, 2024

Codecov Report

Attention: Patch coverage is 75.60976% with 10 lines in your changes missing coverage. Please review.

Project coverage is 78.49%. Comparing base (1396c0a) to head (bbc313d).
Report is 143 commits behind head on main-dev.

Files with missing lines Patch % Lines
pyaerocom/climatology_config.py 72.22% 5 Missing ⚠️
pyaerocom/stationdata.py 50.00% 3 Missing ⚠️
pyaerocom/colocation/colocation_setup.py 83.33% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           main-dev    #1356      +/-   ##
============================================
- Coverage     78.97%   78.49%   -0.48%     
============================================
  Files           136      137       +1     
  Lines         20818    20891      +73     
============================================
- Hits          16441    16399      -42     
- Misses         4377     4492     +115     
Flag Coverage Δ
unittests 78.49% <75.60%> (-0.48%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

How to resample the climatology. Must be mean or median.
freq : str, optional
Which frequency the climatology should have
mincount : dict, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest min_count. Also maybe worth building a field_validator for this? Or possibly a child class of TypedDict

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this module live in pyaerocom/aeroval? I don't really see this being used in the core API

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, it is passed though the colocator, meaning that it is part of the core. If this is moved from the core to aeroval, then some more work is needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case then, no, it should stay here in the core. We don't want to have pyaerocom depend on pyaeroval after all.

Comment on lines +771 to +774
if isinstance(use_climatology_ref, ClimatologyConfig): # pragma: no cover
col_freq = "monthly" # use_climatology_ref.freq
obs_start = use_climatology_ref.start
obs_stop = use_climatology_ref.stop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like we talked about, we should validate this asap, in the colocation_setup, so that no data is read before this change.

@lewisblake lewisblake modified the milestones: m2024-11, m2024-12 Oct 11, 2024
@lewisblake lewisblake modified the milestones: m2024-12, m2025-01 Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working enhancement ✨ New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

use_obs_clim is broken and has always been: Turns all obsdata into nan
3 participants