-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test for latest version resolver #874
Conversation
There was no test to check that the latest version is being resolved correctly if 'latest' is not in the path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Can you solve the codacy issue so we can merge?
I can, but it will be purely cosmetic, as we are already returning if |
Ah, actually there was a better solution after all 😅 |
please don't merge this - it is extremely inefficient on a cluster with an ESGF node like BADC - see my comment on the code. @stefsmeets I very much appreciate your fresh attitude towards the code and the changes you propose (not only here but in other PR's too) are very nice and out-of-the-box - always liked these kind of approaches myself! But there are cases where things are in an apparently bulky way for a good reason. We need to account for a lot of sub-standard standards, resulting in a lot of nasty corner cases. And if a simple solution is apparent, it may work for 98% of the cases but that 2% that it won't work for is gonna kill us 😬 Apols for the pep talk but this is something I am noting with a few new devs, you guys need to cut your teeth in the world of half-measured standards in the climate data filed, not pleasant I know, but you will eventually make it a better place, don't give up! Also test any possible and impossible scenario 😆 |
esmvalcore/_data_finder.py
Outdated
return dirname | ||
|
||
return dirname_template | ||
versions = glob.glob(dirname_template) # -> list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is globbing on eg /badc/cmip5/data/cmip5/output1/
at the very beginning - the entire CMIP5 dir on BADC, task that takes hours
Fair enough. I ran into some issues with this code (see new tests) so I tried to understand what was going on. I found the current implementation very difficult to debug and understand what is going on, so I had a go at refactoring it. I couldn't find any clue in the comments/git history as to why this implementation was chosen in the first place, I'm curious to why this is actually slow, and I would like to have a go at it. I know that |
@stefsmeets yeah man, we were not good at explaining all the stuffs in the code at the beginning - we're getting better now 😁 The problem is that very large data root dirs like they are on the ESGF nodes on Jasmin or DKRZ are too big to be globbed if you don't ease the process by passing extra file/dir clues (like you seen I am doing in the recipe filler, same issue there too). As a general remark - the tools we have now really do need an overhaul, and that's something you've noticed too, but the problem is that they do the job now good enough to not warrant immediate attention, and we're tight for time to invest into a major overhaul project just yet. Stuff like the Intake library, or a fully |
We should not be afraid of overhauling stuff when it is necessary. I understand that there is a lot of legacy, but it is the only way forward if we want to avoid creating a minefield of undocumented edge-cases. For now I have reverted the change, but I would still like to get the extra test in 👍 |
I'm getting some odd test failures unrelated to this PR. This should be ready to merge 👍 |
cheers for your attention @stefsmeets - I agree, but we should probably spend time consolidating now that we have a decent framework, I think consodilation is much more needed from a strategic point of view, and by Odin, we need to do a lot of it (ie optimization mostly) 😁
cheers! please do, don't be discouraged by my nagging 😀 |
cheers! I'll see what's the tests are complaining for and have a test too 👍 |
sorry @stefsmeets this one fell off me radar screen, I'll test tomorrow and have a look at the tests 🍺 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all good by me, cheers @stefsmeets
Feel free to merge 👍 |
Hi all, I ran into a few problems related to the latest version resolver when trying out some things. This patch adds a missing test that should avoid this behaviour in the future. There was no test to check that the latest version is being resolved
correctly if 'latest' is not in the path.
Tasks
- [ ] Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.- [ ] If writing a new/modified preprocessor function, please update the documentation- [ ] Please useyamllint
to check that your YAML files do not contain mistakes- [ ] If you make backward incompatible changes to the recipe format, make a new pull request in the ESMValTool repository and add the link belowIf you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.