-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added a QUASR-downloader #425
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #425 +/- ##
==========================================
- Coverage 92.66% 92.55% -0.12%
==========================================
Files 78 78
Lines 14514 14542 +28
==========================================
+ Hits 13450 13459 +9
- Misses 1064 1083 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
src/simsopt/configs/zoo.py
Outdated
return curves, currents, ma | ||
|
||
elif return_style == 'json': | ||
return coils, ma, surfaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add else:
case to raise an exception if return_style
is something else
src/simsopt/configs/zoo.py
Outdated
|
||
return_style: 'default' or 'json'. If 'default', the function will return the curves, currents and magnetic axis | ||
like the other configurations in the zoo. | ||
If 'json' the function will return the full set of coils, magnetic axis and a number of surfaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "a number of surfaces", do you specifically mean a list of SurfaceXYZTensorFourier objects? If so it would be preferable to give this specific information.
def test_QUASR_downloader(self): | ||
curves, currents, ma = get_QUASR_data(952) | ||
coils, ma, surfaces = get_QUASR_data(952, return_style='json') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about also covering the case of a success with return_style="default".
import json | ||
|
||
try: | ||
import requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "requests" already installed in the CI, or does it need to be added so the test will run?
src/simsopt/configs/zoo.py
Outdated
ID: the ID of the configuration to download. The database is navigatable at https://quasr.flatironinstitute.org/ | ||
Alternatively, you can download the latest full set of devices from https://zenodo.org/doi/10.5281/zenodo.10050655 | ||
|
||
return_style: 'default' or 'json'. If 'default', the function will return the curves, currents and magnetic axis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name "json"
isn't an intuitive description of what the function returns. Maybe "surfaces"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed--it also looks like "default" is only returning one half-period? Am I reading that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this PR is ready yet. We still need to rerun code coverage to make sure no lines are missing, and address @landreman's comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
I'm the author/maintainer of the QUASR navigator website. @andrewgiuliani asked me to take a look at this PR. Really glad for your interest and to see that the site is useful!
I've made a couple suggestions here. (I'm not a part of this project so I don't consider anything I say to be a blocker for this PR--just suggestions.)
Regarding the test, how often would you expect this to be run? While I don't anticipate a big issue, I would want to confirm with our admins before encouraging anyone to run lots of automated downloads from the site as part of a test suite; I would think that it would be better to do a unit test run on every commit & an integration test run on PR merge or whatever. (But I'm not sure whether that works with the existing test setup.)
In your PR, you asked whether there is a list of which integers correspond to a database entry, as you noted they are not contiguous. Unfortunately there isn't a fixed list of good IDs--it's subject to change with each release of new backing data. We could give you a subset, but I don't think a random draw from the complete set of devices is practical at this time. (For context, I was originally enforcing constraints on the valid ID sets in the website routing, but I realized it wouldn't be worthwhile given the likelihood that they would change.)
You also asked about querying against the database for a range of figures of merit. Unfortunately this is also not possible at present--the site actually works by downloading the full set of metadata for all devices, so there isn't a persistent database server to query against. In practice all filtering of the downloaded devices is done client-side, in the browser.
That said, while I appreciate the rigor of selecting random devices, I think using a fixed set of IDs to download should suffice here--we're hopefully not that adversarial an environment 😄 What I might consider would be testing several example IDs of different lengths & a range of device structures, to make sure the full gamut of known-good IDs can be handled by your code & the correct data is downloaded; but honestly even that seems like it might be overkill...
Again, thanks for your interest and please let me know if you'd like to discuss further!
src/simsopt/configs/zoo.py
Outdated
parametrized in Boozer coordinates. | ||
""" | ||
|
||
assert return_style in ['default', 'json'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertions can be disabled by caller/system config [e.g. if PYTHONOPTIMIZE
is set], so it might be better not to rely on assert
for a true runtime (not-debugging) check.
id_str = f"{ID:07d}" | ||
# string to 7 digits | ||
url = f'https://quasr.flatironinstitute.org/simsopt_serials/{id_str[0:4]}/serial{id_str}.json' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but note that we reserve the right to change the ID structure in new versions of the database, which would break this code. (Fair warning!)
I might be able to set up an endpoint on QUASR for converting an ID (arbitrary length) to the correct location, but I wouldn't have a chance to do that for at least a few weeks.
src/simsopt/configs/zoo.py
Outdated
print(f"Configuration with ID {ID:07} downloaded successfully") | ||
surfaces, ma, coils = json.loads(r.content, cls=GSONDecoder) | ||
else: | ||
raise ValueError(f"Configuration ID {ID:07d} does not exist. Status code: {r.status_code}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't strictly accurate--the download might've failed for some other reason.
import unittest | ||
from simsopt.configs import get_QUASR_data | ||
|
||
class ZooTests(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that, as you're hitting an external resource (QUASR website), this isn't strictly a unit test but an integration test. Now, an integration test may be what you actually care about, but there's always the risk that it will fail (or start failing) due to factors that don't indicate an error in your code.
I would consider mocking the requests.get
call for regular testing, and have a separate integration test that's run less frequently.
curves, currents, ma = get_QUASR_data(952) | ||
coils, ma, surfaces = get_QUASR_data(952, return_style='json') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might also be useful to have assertions verifying (from the returned data) that these are the same device record.
src/simsopt/configs/zoo.py
Outdated
ID: the ID of the configuration to download. The database is navigatable at https://quasr.flatironinstitute.org/ | ||
Alternatively, you can download the latest full set of devices from https://zenodo.org/doi/10.5281/zenodo.10050655 | ||
|
||
return_style: 'default' or 'json'. If 'default', the function will return the curves, currents and magnetic axis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed--it also looks like "default" is only returning one half-period? Am I reading that right?
Sorry to leave this hanging so long, but I think this would be a great addition, and am wanting to use this as I am looking at more QUASRs with pyoculus, and it would be great to directly grab them from a script. @jsoules, has the content of the json changed with newer entries to the database? For example https://quasr.flatironinstitute.org/model/1258083 only returns surfaces and coils @andrewgiuliani, lets divide the work, it should be pretty minimal, hard-code a few numbers into the tests and adapt to the new format of QUASR (no magnetic axis anymore :'( ) |
Hi, The data in the individual-record JSON is an extract of the relevant row for the whole-database JSON; this is what populates the "Device Metadata" block on the right-hand side in the web UI. The set of fields included in the JSON (i.e. the quantities of interest) have changed in different releases, but without knowing the time point you're comparing to, I'm hesitant to give a definitive answer. But any changes would affect all devices, as the same format is used for the entire database. |
When we wrote the scripts in June, the database would return a json file containing This would allow a user to more easily swap out a quasr configuration in an existing script. Is it also possible for the user to request the entire row from the database? These quantities could be useful in scripts that further deal with this data. |
Hi Chris, you're right! I have those json files too, we will swap them out so the axis is returned as well. As for the row of the data frame, that should be doable as well. Let me discuss with Jeff |
Hi, I also think this is a great addition and would further suggest to optionally cache the requests to avoid unnecessary server load. I'd suggest such a minimally invasive implementation: #468 Also related to @jsoules comment #425 (review) since the requests cache for the unit test could be stored as an artifact, avoiding any traffic generated by CI. |
It would be amazing to have all of the QUASR configurations accessible with just one command.
I implemented a simple interface using the 'requests' library, that downloads any QUASR field.
I tried to make it so that it picks a random configuration, but apparently not all integers between 0 and 300,000 correspond to an entry in the database.
@andrewgiuliani is there a list of which entries exist, so that a robust random picker can be returned?
Or even better, the user could directly query the database for a list of constraints (NFP, A, iota, coil length, etc) and get a random config that fits their needs.
Something like this must be already implemented on the website, one would only have to adapt this to python. I am happy to do this but would need more info on the database itself. Would this be of interest @andrewgiuliani?