Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved access to AGASC supplement #56

Merged
merged 15 commits into from
Jan 28, 2021
Merged

Improved access to AGASC supplement #56

merged 15 commits into from
Jan 28, 2021

Conversation

taldcroft
Copy link
Member

@taldcroft taldcroft commented Jan 15, 2021

Description

This adds a new function agasc.get_supplement_table() which should be the primary external way to access the AGASC supplement. With this it is possible to get any of the 3 tables as either a Table or a keyed dict.

It also fixes a bug that showed up in functional testing with proseco, namely that the proseco_agasc_1p7.h5 file doesn't have a MAG_CATID so the update code was failing due to a key error.

The function is cached with a one-hour timeout, so a long-lived process will get any updates that occur in the meantime.

Testing

Testing used the current agasc_supplement.h5 file from /proj/sot/ska/jgonzalez/miniconda3/envs/agasc/data/agasc (if I got that path right from memory), except that I added one entry to the obs table.

  • Passes unit tests on MacOS
  • Functional testing

Functional testing

Also used this with proseco via and sparkles via sot/proseco#345 and sparkles via sot/sparkles#149.

@jeanconn
Copy link
Contributor

I know I'm the most pedantic of us, but do we need/want a way to identify which version of the supplement was used? And would that be just a date key or something?

@taldcroft
Copy link
Member Author

I guess the issue is that the AGASC supplement is not versioned and it is not possible to roll back to an arbitrary version even if you have a version / date. So right now we basically have "use the latest/greatest supplement", or "don't use the supplement at all for reproducibility".

I hear you that this might be a nice-to-have, but it doesn't quite fit into the current data model we've established. Unless there is something I'm not considering.

@taldcroft
Copy link
Member Author

It occurred to me in the meantime that we could probably just put the whole file into a git repo and make a commit and tag at the point of each "release" (promotion). The version can be stored in the file in a HDF5 keyword.

I'd be happy enough with that and the use_supplement keyword could even be allowed to take a string indicating the version (tag) to use.

@javierggt
Copy link
Contributor

I think it is good, in principle, to be able to somehow keep track of the supplement version. I would not recommend adding the supplement itself to version control unless we change the format.

Currently, the file is about 1.5 GB, and we intend to modify it once a week. If git keeps a copy of each version, that is 75 GB in a year. And this gets downloaded when you clone the repository. If we still want to keep the format and add it to git, one option could be to use git lfs. I'm not sure that is a good idea though.

@taldcroft
Copy link
Member Author

I'm showing the agasc_supplement.h5 file as 5.9 Mb, and it will probably be close to half that size when you change the dtypes. I was only thinking of putting that one file in version control.

@taldcroft
Copy link
Member Author

Also, that repo would not be part of the agasc project repo on GH, just a local git repo that allows us to look back in time at versions.

@jeanconn
Copy link
Contributor

Right, I don't have a huge justification for the additional work/tracking version, but I was disturbed at the thought of not being able to easily reproduce star selection. I had been thinking of either a "save last week's file" method or just tracking of date so one could regenerate a supplement as it would have existed at a date. The git repo of the file with tags for weekly releases sounds better. I note that I can't think of a reason to distribute the repo (unlike, say, the new thermal models version control with git repo as the sync'd data product). Which I mention not because it was suggested... just talking this out in the comments.

@javierggt
Copy link
Contributor

ok, so it seems we will use a separate repository for versioning the supplement. If so, then we can use the commit SHA or the version returned by ska_helpers, but then the update scripts should include committing to the repository.

agasc/agasc.py Outdated Show resolved Hide resolved
agasc/agasc.py Outdated Show resolved Hide resolved
agasc/agasc.py Outdated Show resolved Hide resolved
@taldcroft
Copy link
Member Author

I think I have addressed all the comments now, and also made the main documentation page a bit better.

agasc/agasc.py Outdated Show resolved Hide resolved
agasc/agasc.py Outdated Show resolved Hide resolved
@taldcroft taldcroft merged commit 4700ead into master Jan 28, 2021
@taldcroft taldcroft deleted the supplement-better branch January 28, 2021 21:55
@javierggt javierggt mentioned this pull request Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants