Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster get_stars() function #58

Merged
merged 2 commits into from
Jan 29, 2021
Merged

Faster get_stars() function #58

merged 2 commits into from
Jan 29, 2021

Conversation

taldcroft
Copy link
Member

Description

For getting stars by AGASC_ID, at some point it is faster to just read the entire AGASC catalog into memory. This PR implements that method using a threshold of 5000 stars. That value was determined based on testing on my laptop.

Testing

  • Passes unit tests on MacOS.
  • Functional testing

Functional testing

For the AGASC supplement with about 81000 stars, the original (tables.read_where()) method takes about 80 seconds, while the new method takes about 13 seconds.

agasc/agasc.py Outdated
@@ -366,33 +406,32 @@ def get_stars(ids, agasc_file=None, dates=None, fix_color1=True, use_mag_est=Fal
:param dates: Dates for proper motion (scalar or array) (default=Now)
:param fix_color1: set COLOR1=COLOR2 * 0.85 for stars with V-I color (default=True)
:param use_mag_est: Use estimated mag from AGASC supplement where available (default=False)
:param read_entire_agasc: Number of stars above which to read the entire AGASC (default=5000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love that you set the number of stars as function parameter, though the name sounds more like it should be a flag/boolean.

Copy link
Contributor

@javierggt javierggt Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree with Jean that the name suggests something else to me (it suggests a boolean).

I would call it cache_threshold or something like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to method_threshold.

Note - there is no caching involved, on purpose. I don't want to leave the whole AGASC and index dict in memory by default. Maybe a subsequent PR could allow that if it turns out to be useful. In my work the large get_stars calls are just one-offs.

Copy link
Contributor

@jeanconn jeanconn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me. I noted that the option might not have the best name but I don't consider that blocking.

@taldcroft
Copy link
Member Author

Note that the flake8 issues are unrelated to this PR.

@taldcroft taldcroft merged commit 2103840 into master Jan 29, 2021
@taldcroft taldcroft deleted the faster-get-stars branch January 29, 2021 13:48
@javierggt javierggt mentioned this pull request Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants