Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add institute name used on ESGF for CMIP5 CanAM4, CanCM4, and CanESM2 #1937

Merged
merged 2 commits into from
Feb 27, 2023

Conversation

bouweandela
Copy link
Member

@bouweandela bouweandela commented Feb 20, 2023

Description

This adds the institute name CCCMA used on ESGF for the CMIP5 datasets CanAM4, CanCM4, and CanESM2. Without this institute name, the tool is not able to find the files for these datasets on ESGF.


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@bouweandela bouweandela added this to the v2.8.0 milestone Feb 20, 2023
@schlunma
Copy link
Contributor

Thanks for opening this PR @bouweandela! I am currently testing #1609 again and I am getting lots of errors due to this. Can you make this ready for review? I tested it and it worked fine 👍

Question on this: Why did this only occur now? Could this also happen for other CMIP5 datasets or are we sure that these models are the only case?

@bouweandela
Copy link
Member Author

Why did this only occur now?

Prior to #1920 the value for institute was ignored when searching ESGF. When I originally implemented ESGF search, I ran into issues like this (i.e. files with wrong facet values), so I only selected a rather minimal set of facets to use. However, that doesn't play nice with the automatic finding of datasets.

Could this also happen for other CMIP5 datasets or are we sure that these models are the only case?

Yes, there are quite a few files on ESGF that have wrong facet values. However, this was the most obvious case. A simple way to check this is to use Dataset.from_files() with an * for as many facets as possible (e.g. only define project, mip, and short_name as not wildcard) and then checking if the tool can find the files for each of the datasets that come out.

@bouweandela
Copy link
Member Author

I am currently testing #1609 again and I am getting lots of errors due to this. Can you make this ready for review? I tested it and it worked fine 👍

There are several tests failing because they use this dataset as an example. Those tests may have been changed in #1609, so I was planning to fix them only once #1609 is merged to avoid fixing outdated stuff. Would it work for you to just merge this into your testing branch for #1609 for now?

@schlunma
Copy link
Contributor

I see, that makes sense. How exactly does the tool handle lists as additional facets? Is it simply checking for all possible combinations and is happy when it finds at least one file?

Would it work for you to just merge this into your testing branch for #1609 for now?

Yeah, that's what I already did.

@bouweandela
Copy link
Member Author

bouweandela commented Feb 20, 2023

How exactly does the tool handle lists as additional facets?

It will try to concatenate the data. In this case the institute CCCMA is used on ESGF and the institute CCCma is used on the local file system, so this shouldn't be an issue. For finding files, using a list will match any of the specified options, as described here and here.

@valeriupredoi
Copy link
Contributor

thanks lots @bouweandela - as directly affected via Tool's #2994 and scratching my head interminably before you told me it was the blithering CanESM2, I fully embrace this arms open. Can I fix the tests for you so we get this ready-ready?

@bouweandela
Copy link
Member Author

That's a great offer, thanks! Could you do if after #1609 is merged, to avoid fixing outdated things and creating merge conflicts?

@valeriupredoi
Copy link
Contributor

That's a great offer, thanks! Could you do if after #1609 is merged, to avoid fixing outdated things and creating merge conflicts?

sounds like a plan Batman 🦇

@codecov
Copy link

codecov bot commented Feb 27, 2023

Codecov Report

Merging #1937 (d0b9065) into main (aa4a8c5) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #1937   +/-   ##
=======================================
  Coverage   92.60%   92.60%           
=======================================
  Files         236      236           
  Lines       12361    12361           
=======================================
  Hits        11447    11447           
  Misses        914      914           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@bouweandela bouweandela marked this pull request as ready for review February 27, 2023 12:05
@bouweandela
Copy link
Member Author

Can I fix the tests for you so we get this ready-ready?

Just fixed it myself, there was only one test remaining that failed.

@valeriupredoi
Copy link
Contributor

Can I fix the tests for you so we get this ready-ready?

Just fixed it myself, there was only one test remaining that failed.

there goes my chance for fame and glory then 😁

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cheers @bouweandela - but do we still have tests with CCCma and CCCMA?

@bouweandela
Copy link
Member Author

bouweandela commented Feb 27, 2023

No, and I don't think we need those. We have tests that adding extra facets works correctly. The only reason that CanESM2 featured in many examples is that it's early in the alphabet. Hopefully, we can just get this issue corrected on ESGF (did not receive a reply to my email asking how to go about that so far) and then we no longer need the extra value in all uppercase.

@valeriupredoi
Copy link
Contributor

sounds good to me, man! OK - I can merge this now if you all good 👍 Let me go do make some noise myself towards ESGF after a few more days have passed

Copy link
Contributor

@schlunma schlunma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Please don't forget to add a lable and tick/remove all boxes in the PR description.

@valeriupredoi valeriupredoi added the bug Something isn't working label Feb 27, 2023
@valeriupredoi valeriupredoi merged commit 6d285d2 into main Feb 27, 2023
@valeriupredoi valeriupredoi deleted the add-cccma-uppercase branch February 27, 2023 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants