-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streamline observations download #1657
Conversation
Cool idea @jvegasbsc! Maybe we could choose a different name than obs or observations, because not everything we cmorize are observations? See also #1407. I'm not too sure about inclusion as a subcommand in the |
I think the discoverability we will get with a single entry point is worth it. But my idea will be to find a way to allow ESMValTool to add commands to the interface without ESMValCore knowing anything about them, like a plug-in or similar |
@jvegasbsc this is very cool! But I have an important question (maybe @mattiarighi can also shed more light on it too): what are the licensing troubles we may get into if some random user downloads Tier 3 data via ESMValTool and starts redistributing it against the rules? |
In that case, the illegal part is to redistribute, not to download. We are not supporting that part, just the downloading procedure, which is completely fine |
there is a fine line between download and redistribution - we should make sure that the tool doesn't write any freshly downloaded Tier 3 files to any location before CMORizing them but rather downloads and CMORizes them on the fly; there is also the issue of clogging the disk if the tool keeps the originally downloaded files 🍺 |
I found a way to allow adding subcommands in Step 1: define a 'esmvaltool_commands': [
'data = esmvaltool.cmorizers.obs.cmorize_obs:DataCommand'
] In ESMValCore, scan for 'esmvaltool_commands' entry points and add the classes to the Fire interface: for ep in iter_entry_points('esmvaltool_commands'):
if hasattr(self, ep.name):
logger.error('Registered command %s already exists', ep.name)
continue
self.__setattr__(ep.name, ep.load()) so we can decouple completely ESMValCore from this. |
c3d4dde
to
9492d91
Compare
Would it be an idea to add a command like
that would download and reformat all data needed to run recipe_example.yml? That way users would not need to bother with figuring out what data is needed exactly to run a recipe. Maybe that could be simplified even further to
to combine all steps of obtaining the data, formatting it and running into a single command. |
5dec1b8
to
028d098
Compare
Thanks a lot for your suggestions and your help in getting rid of the Codacy issues @zklaus! Please let me know if you are fine with the changes. There is a final issue to fix but it depends on how we want the I have rerun many test cases and everything works fine. Therefore, I would really like to have this PR merged soon and makig the new cmorizer interface available in v2.5. Last point for @zklaus: Shall we revert the changes regarding the pylint&co configuration files? It seems to me that further discussion is needed in #2527 before this part can be finalized. |
Hi @zklaus, I have finished fixing the last (problematic) Codacy issue after deciding on the behaviour of the We would still like to merge this PR this week, say Thursday, so that it lives in the main branch for a few days before the v2.5 release and we can see how the automatic testing engines react to that. |
@remi-kazeroni, nice work! I'll have very few minor comments, but regarding the reverting of the linter configs: Absolutely, do revert them. In this branch, they were only intended to help filter the barrage of codacy problems so that we could see what really needs addressing. |
Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>
Stellar work, @remi-kazeroni ! |
Thanks a lot @zklaus! All credits should go to @jvegreg for doing all the coding! 🍻 But I still wanted to address your final comment on the timerange selection. I agree the tool should return an error if the time range selection is wrong (wrong argument in the cli). The current assumption that a wrong timerange leads to the downloading/formatting of the whole datasets is not ideal in my opinion. Please nobody merge now 😆 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time range issue as described by @remi-kazeroni
I have done some further tests on possible misuse of the time ranges arguments ( In short: there may be room for further enhancements but I would leave that for future PRs since I have not found any problematic cases with the current handling of the time range. |
Not sure how to display the codacy report of the second to last commit, but the number of issue was 7 without any problematic issue left. The number grew to 310 after reverting the changes on To be able to merge this, I guess we just need a final approval from @zklaus and coordinate the merge button hitting with @schlunma (I'd need a bit of time to draft a message to open a Github discussion about the changes related to the new cmorizer interface). Thanks to all who contributed, reviewed, tested!! 🥳 🍻 |
Sounds great, @remi-kazeroni 👍 I will merge this PR shortly after you opened the discussion. Thanks to everyone who contributed to this, this is an awesome addition 🎉 |
I'm not sure it's possible with Codacy. In principle, you can click on the little icon in front of the commit hash in the listing above (a green check mark, a red x, or a yellow dot for successful, failed, and ongoing checks, respectively). This will pop up a little dialog where you can click on details to see the specifics of a particular check. Have a look at the following screenshot. The problem with Codacy is that it seems to show only the details for the last run, no matter which one you select. That's a bug in Codacy, I think. |
Great, thanks to everyone, particularly @remi-kazeroni for pushing this over the finish line! 🎉 There certainly is room for improvement, but as you said, let's take that up later. |
Thanks! @schlunma: could you please merge this now? It would help me to get the right links in the documentation to write the discussion. |
@remi-kazeroni and the Streamliners band, legends! 🍺 |
The aim of this pull request is to streamline the procedure to download and reformat observational datasets for ESMValTool. I do not aim to have automated downloads for every dataset we are currently supporting, but to provide the framework and a good number of examples of implementation so it becomes easy to implement in the near future.
In particular, I have created three basic downloader classes for HTPPS (using wget), FTP (using the python library) and Climate Data Store (using cdsapi) downloads along with some example implementations for some datasets. I also provide some derived classes to simplify getting dataset from specific sources, like the NASA storage or the ESA-CCI CEDA anonymous FTP.
Regarding the interface, I plan to use the reworked CLI interface in ESMValGroup/ESMValCore#605 to refactor it so we get something similar to:
esmvaltool data prepare DATASET1,DATASET2 --start 1990 --end 2015
esmvaltool data download DATASET1,DATASET2 --start 1990 --end 2015
esmvaltool data format DATASET1,DATASET2 --start 1990 --end 2015
where prepare will download and format the data in one go. Hopefully, this will make life easier for those user that do not have access to JASMIN nor DKRZ.
Tasks
yamllint
to check that your YAML files do not contain mistakes