Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for latest version resolver #874

Merged
merged 6 commits into from
Dec 11, 2020

Conversation

stefsmeets
Copy link
Contributor

@stefsmeets stefsmeets commented Nov 20, 2020

Hi all, I ran into a few problems related to the latest version resolver when trying out some things. This patch adds a missing test that should avoid this behaviour in the future. There was no test to check that the latest version is being resolved
correctly if 'latest' is not in the path.

Tasks

  • Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
  • This pull request has a descriptive title that can be used in a changelog
  • Add unit tests
    - [ ] Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.
    - [ ] If writing a new/modified preprocessor function, please update the documentation
  • Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
  • Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.
    - [ ] Please use yamllint to check that your YAML files do not contain mistakes
    - [ ] If you make backward incompatible changes to the recipe format, make a new pull request in the ESMValTool repository and add the link below

If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.


@stefsmeets stefsmeets marked this pull request as ready for review November 20, 2020 15:41
@stefsmeets stefsmeets requested a review from jvegreg November 20, 2020 15:43
There was no test to check that the latest version is being resolved
correctly if 'latest' is not in the path
Copy link
Contributor

@jvegreg jvegreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Can you solve the codacy issue so we can merge?

@stefsmeets
Copy link
Contributor Author

Nice work. Can you solve the codacy issue so we can merge?

I can, but it will be purely cosmetic, as we are already returning if versions is empty.

@stefsmeets
Copy link
Contributor Author

Ah, actually there was a better solution after all 😅

@valeriupredoi
Copy link
Contributor

please don't merge this - it is extremely inefficient on a cluster with an ESGF node like BADC - see my comment on the code.

@stefsmeets I very much appreciate your fresh attitude towards the code and the changes you propose (not only here but in other PR's too) are very nice and out-of-the-box - always liked these kind of approaches myself! But there are cases where things are in an apparently bulky way for a good reason. We need to account for a lot of sub-standard standards, resulting in a lot of nasty corner cases. And if a simple solution is apparent, it may work for 98% of the cases but that 2% that it won't work for is gonna kill us 😬 Apols for the pep talk but this is something I am noting with a few new devs, you guys need to cut your teeth in the world of half-measured standards in the climate data filed, not pleasant I know, but you will eventually make it a better place, don't give up! Also test any possible and impossible scenario 😆

return dirname

return dirname_template
versions = glob.glob(dirname_template) # -> list
Copy link
Contributor

@valeriupredoi valeriupredoi Nov 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is globbing on eg /badc/cmip5/data/cmip5/output1/ at the very beginning - the entire CMIP5 dir on BADC, task that takes hours

@stefsmeets
Copy link
Contributor Author

Fair enough. I ran into some issues with this code (see new tests) so I tried to understand what was going on. I found the current implementation very difficult to debug and understand what is going on, so I had a go at refactoring it. I couldn't find any clue in the comments/git history as to why this implementation was chosen in the first place,

I'm curious to why this is actually slow, and I would like to have a go at it. I know that pathlib has a better globber built in, which is based on os.scandir which is supposedly a lot faster.

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Nov 27, 2020

@stefsmeets yeah man, we were not good at explaining all the stuffs in the code at the beginning - we're getting better now 😁 The problem is that very large data root dirs like they are on the ESGF nodes on Jasmin or DKRZ are too big to be globbed if you don't ease the process by passing extra file/dir clues (like you seen I am doing in the recipe filler, same issue there too). As a general remark - the tools we have now really do need an overhaul, and that's something you've noticed too, but the problem is that they do the job now good enough to not warrant immediate attention, and we're tight for time to invest into a major overhaul project just yet. Stuff like the Intake library, or a fully glob.glob() approach would be good but there is simply no manpower to do it yet, and then there's all these corner cases and nooks and crannies - that's what I meant in my pep talk above (sorry if that came across rough, I really am very happy we now have fresh troops like you and the other NLESC peeps 🍺 )

@stefsmeets
Copy link
Contributor Author

As a general remark - the tools we have now really do need an overhaul, and that's something you've noticed too, but the problem is that they do the job now good enough to not warrant immediate attention, and we're tight for time to invest into a major overhaul project just yet.

We should not be afraid of overhauling stuff when it is necessary. I understand that there is a lot of legacy, but it is the only way forward if we want to avoid creating a minefield of undocumented edge-cases.

For now I have reverted the change, but I would still like to get the extra test in 👍

@stefsmeets stefsmeets changed the title Simplify latest version resolver Add test for latest version resolver Dec 1, 2020
@stefsmeets
Copy link
Contributor Author

I'm getting some odd test failures unrelated to this PR. This should be ready to merge 👍

@valeriupredoi
Copy link
Contributor

We should not be afraid of overhauling stuff when it is necessary. I understand that there is a lot of legacy, but it is the only way forward if we want to avoid creating a minefield of undocumented edge-cases.

cheers for your attention @stefsmeets - I agree, but we should probably spend time consolidating now that we have a decent framework, I think consodilation is much more needed from a strategic point of view, and by Odin, we need to do a lot of it (ie optimization mostly) 😁

For now I have reverted the change, but I would still like to get the extra test in

cheers! please do, don't be discouraged by my nagging 😀

@valeriupredoi
Copy link
Contributor

I'm getting some odd test failures unrelated to this PR. This should be ready to merge +1

cheers! I'll see what's the tests are complaining for and have a test too 👍

@valeriupredoi
Copy link
Contributor

sorry @stefsmeets this one fell off me radar screen, I'll test tomorrow and have a look at the tests 🍺

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all good by me, cheers @stefsmeets

@stefsmeets
Copy link
Contributor Author

all good by me, cheers @stefsmeets

Feel free to merge 👍

@valeriupredoi valeriupredoi merged commit 475591c into master Dec 11, 2020
@valeriupredoi valeriupredoi deleted the simplify_latestversion_resolver branch December 11, 2020 12:05
@jvegreg jvegreg added the testing label Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants