-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate scanning CORDEX data with catalog-maker #10
Comments
This seems to be working fine, however it is necessary to add Example entry:
is |
Hi @ellesmith88, that's great news. Yes, @cehbrecht: do you have a list of valid datasets for C3S-Cordex? When you do, please provide to @ellesmith88 who can create a new intake catalog from them. |
@ellesmith88 @agstephens I have no feedback by @glevava yet. But I have opened a pull request for cordex in our manifest repo: It contains two files with the complete file list of CORDEX datasets replicated to all 3 sites:
We used these files to check if our repos are in sync. Note: These are not the official files given to the CDS people. But they should have the same datasets ... just in a different format. |
Sorry to forgot answering this. I sent the CDS manifest to @cehbrecht through Slack. |
I have added it to the cordex pull request in the manifest repo :) |
Thanks @glevava @cehbrecht |
@ellesmith88 can you try running the catalog-maker on the datasets listed at: Thanks |
These jobs are running on LOTUS, look to be going well so far |
@ellesmith88: please can you give an update on the status of the CORDEX Intake catalog. |
@agstephens 194/23196 left to scan, so I'm running it again. 104 with errors so far, some of which I might be able to fix so having a look today |
Update on this: out of 23196 total datasets:
For the remaining 86 they failed because of one of these errors:
These datasets were partially scanned (some files failed and some did not):
So any successful runs were removed. |
I have compared the missing CORDEX files with our replicated data archive. We have those files about with a newer version. Instead of:
We have:
@glevava Do you have an updated version of the manifest? |
I've just looked again and so to give a bit more information: Of the partially scanned datasets - some files of these datasets fail because of errors like:
For the error
Looking at the dataset shows the same is true for lon. Looking at the data, most latitude values are something like 2.96337414e+00, here's a subset of the output of
So it looks as though they are maybe meant to be fill values, but it's strange because there are some that are The other files all show the same thing. |
Thanks @ellesmith88, could you ncdump the sections in the example files to check whether the coordinate variables include a |
@agstephens They don't include a For the main variable in the file I looked at above there is
but none for the coordinate variables |
@ellesmith88 thanks. It looks like this is a problem with the data and we should just exclude them. Please generate an output file and attach it here that records the errors, e.g. one per line:
Then we can look it up easily later. Thanks |
I corrected the manifest. This was an old version. v20210505 we all have locally is the right one. |
Thanks @glevava @ellesmith88 please can you try a rescan based on the new manifest provided by Guillaume. |
@ellesmith88 I have opened a PR with the fixed version of the cordex manifest: @glevava The fixed manifest in this PR differs from yours. The version mismatch was for 36 datasets (see above) not just one. I have manually updated it. I have added a notebook in the PR to compare the manifest with the scanned files on disk (which should be available at all 3 sites). The fixed manifest contains now only available datasets. But it is missing 855 datasets which are on disk. @glevava are these for a new cordex batch to the CDS? |
The rescan is in progress, I'm attaching the error output from the first scan. I will generate a new error output after the rescan is complete. |
Here is the updated error output from the rescan The missing datasets are now included and the error from issue #11 has been resolved, now there are 23171 datasets successfully scanned out of 23196 |
@ellesmith88 Please construct and publish the CORDEX catalog to github. |
Find a small amount of CORDEX (C3S) data, and see if the catalog-maker works with it.
The text was updated successfully, but these errors were encountered: