Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added a QUASR-downloader #425

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

added a QUASR-downloader #425

wants to merge 4 commits into from

Conversation

smiet
Copy link
Contributor

@smiet smiet commented Jun 12, 2024

It would be amazing to have all of the QUASR configurations accessible with just one command.

I implemented a simple interface using the 'requests' library, that downloads any QUASR field.

I tried to make it so that it picks a random configuration, but apparently not all integers between 0 and 300,000 correspond to an entry in the database.
@andrewgiuliani is there a list of which entries exist, so that a robust random picker can be returned?
Or even better, the user could directly query the database for a list of constraints (NFP, A, iota, coil length, etc) and get a random config that fits their needs.

Something like this must be already implemented on the website, one would only have to adapt this to python. I am happy to do this but would need more info on the database itself. Would this be of interest @andrewgiuliani?

@smiet smiet self-assigned this Jun 12, 2024
Copy link

codecov bot commented Jun 12, 2024

Codecov Report

Attention: Patch coverage is 34.48276% with 19 lines in your changes missing coverage. Please review.

Project coverage is 92.55%. Comparing base (27a76cf) to head (c52dd99).

Files with missing lines Patch % Lines
src/simsopt/configs/zoo.py 34.48% 19 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #425      +/-   ##
==========================================
- Coverage   92.66%   92.55%   -0.12%     
==========================================
  Files          78       78              
  Lines       14514    14542      +28     
==========================================
+ Hits        13450    13459       +9     
- Misses       1064     1083      +19     
Flag Coverage Δ
unittests 92.55% <34.48%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

landreman
landreman previously approved these changes Jun 19, 2024
return curves, currents, ma

elif return_style == 'json':
return coils, ma, surfaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add else: case to raise an exception if return_style is something else

src/simsopt/configs/zoo.py Outdated Show resolved Hide resolved

return_style: 'default' or 'json'. If 'default', the function will return the curves, currents and magnetic axis
like the other configurations in the zoo.
If 'json' the function will return the full set of coils, magnetic axis and a number of surfaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "a number of surfaces", do you specifically mean a list of SurfaceXYZTensorFourier objects? If so it would be preferable to give this specific information.

def test_QUASR_downloader(self):
curves, currents, ma = get_QUASR_data(952)
coils, ma, surfaces = get_QUASR_data(952, return_style='json')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about also covering the case of a success with return_style="default".

import json

try:
import requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "requests" already installed in the CI, or does it need to be added so the test will run?

ID: the ID of the configuration to download. The database is navigatable at https://quasr.flatironinstitute.org/
Alternatively, you can download the latest full set of devices from https://zenodo.org/doi/10.5281/zenodo.10050655

return_style: 'default' or 'json'. If 'default', the function will return the curves, currents and magnetic axis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "json" isn't an intuitive description of what the function returns. Maybe "surfaces"?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed--it also looks like "default" is only returning one half-period? Am I reading that right?

Copy link
Contributor

@andrewgiuliani andrewgiuliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this PR is ready yet. We still need to rerun code coverage to make sure no lines are missing, and address @landreman's comments.

Copy link

@jsoules jsoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
I'm the author/maintainer of the QUASR navigator website. @andrewgiuliani asked me to take a look at this PR. Really glad for your interest and to see that the site is useful!

I've made a couple suggestions here. (I'm not a part of this project so I don't consider anything I say to be a blocker for this PR--just suggestions.)

Regarding the test, how often would you expect this to be run? While I don't anticipate a big issue, I would want to confirm with our admins before encouraging anyone to run lots of automated downloads from the site as part of a test suite; I would think that it would be better to do a unit test run on every commit & an integration test run on PR merge or whatever. (But I'm not sure whether that works with the existing test setup.)

In your PR, you asked whether there is a list of which integers correspond to a database entry, as you noted they are not contiguous. Unfortunately there isn't a fixed list of good IDs--it's subject to change with each release of new backing data. We could give you a subset, but I don't think a random draw from the complete set of devices is practical at this time. (For context, I was originally enforcing constraints on the valid ID sets in the website routing, but I realized it wouldn't be worthwhile given the likelihood that they would change.)

You also asked about querying against the database for a range of figures of merit. Unfortunately this is also not possible at present--the site actually works by downloading the full set of metadata for all devices, so there isn't a persistent database server to query against. In practice all filtering of the downloaded devices is done client-side, in the browser.

That said, while I appreciate the rigor of selecting random devices, I think using a fixed set of IDs to download should suffice here--we're hopefully not that adversarial an environment 😄 What I might consider would be testing several example IDs of different lengths & a range of device structures, to make sure the full gamut of known-good IDs can be handled by your code & the correct data is downloaded; but honestly even that seems like it might be overkill...

Again, thanks for your interest and please let me know if you'd like to discuss further!

parametrized in Boozer coordinates.
"""

assert return_style in ['default', 'json']
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions can be disabled by caller/system config [e.g. if PYTHONOPTIMIZE is set], so it might be better not to rely on assert for a true runtime (not-debugging) check.

Comment on lines +217 to +219
id_str = f"{ID:07d}"
# string to 7 digits
url = f'https://quasr.flatironinstitute.org/simsopt_serials/{id_str[0:4]}/serial{id_str}.json'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but note that we reserve the right to change the ID structure in new versions of the database, which would break this code. (Fair warning!)

I might be able to set up an endpoint on QUASR for converting an ID (arbitrary length) to the correct location, but I wouldn't have a chance to do that for at least a few weeks.

print(f"Configuration with ID {ID:07} downloaded successfully")
surfaces, ma, coils = json.loads(r.content, cls=GSONDecoder)
else:
raise ValueError(f"Configuration ID {ID:07d} does not exist. Status code: {r.status_code}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strictly accurate--the download might've failed for some other reason.

import unittest
from simsopt.configs import get_QUASR_data

class ZooTests(unittest.TestCase):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that, as you're hitting an external resource (QUASR website), this isn't strictly a unit test but an integration test. Now, an integration test may be what you actually care about, but there's always the risk that it will fail (or start failing) due to factors that don't indicate an error in your code.
I would consider mocking the requests.get call for regular testing, and have a separate integration test that's run less frequently.

Comment on lines +6 to +7
curves, currents, ma = get_QUASR_data(952)
coils, ma, surfaces = get_QUASR_data(952, return_style='json')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also be useful to have assertions verifying (from the returned data) that these are the same device record.

ID: the ID of the configuration to download. The database is navigatable at https://quasr.flatironinstitute.org/
Alternatively, you can download the latest full set of devices from https://zenodo.org/doi/10.5281/zenodo.10050655

return_style: 'default' or 'json'. If 'default', the function will return the curves, currents and magnetic axis
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed--it also looks like "default" is only returning one half-period? Am I reading that right?

@smiet smiet marked this pull request as draft November 12, 2024 09:16
@smiet
Copy link
Contributor Author

smiet commented Nov 12, 2024

Sorry to leave this hanging so long, but I think this would be a great addition, and am wanting to use this as I am looking at more QUASRs with pyoculus, and it would be great to directly grab them from a script.

@jsoules, has the content of the json changed with newer entries to the database? For example https://quasr.flatironinstitute.org/model/1258083 only returns surfaces and coils

@andrewgiuliani, lets divide the work, it should be pretty minimal, hard-code a few numbers into the tests and adapt to the new format of QUASR (no magnetic axis anymore :'( )

@jsoules
Copy link

jsoules commented Nov 12, 2024

@jsoules, has the content of the json changed with newer entries to the database? For example https://quasr.flatironinstitute.org/model/1258083 only returns surfaces and coils

Hi,
Not sure what you mean--I'm looking at that link and I see it populating with surfaces, coils, modB, and currents, as well as populating the Poincare plots. (All those assets are present on the site back-end.)

The data in the individual-record JSON is an extract of the relevant row for the whole-database JSON; this is what populates the "Device Metadata" block on the right-hand side in the web UI.

The set of fields included in the JSON (i.e. the quantities of interest) have changed in different releases, but without knowing the time point you're comparing to, I'm hesitant to give a definitive answer. But any changes would affect all devices, as the same format is used for the entire database.

@smiet
Copy link
Contributor Author

smiet commented Nov 13, 2024

When we wrote the scripts in June, the database would return a json file containing surfaces, magnetic_axis, coils, which allowed us to mirror the return style of other configurations in the simsopt zoo such as simsopt.configs.get_w7x_data().

This would allow a user to more easily swap out a quasr configuration in an existing script.

Is it also possible for the user to request the entire row from the database? These quantities could be useful in scripts that further deal with this data.

@andrewgiuliani
Copy link
Contributor

Hi Chris, you're right! I have those json files too, we will swap them out so the axis is returned as well. As for the row of the data frame, that should be doable as well. Let me discuss with Jeff

@missing-user
Copy link
Contributor

missing-user commented Jan 8, 2025

Hi, I also think this is a great addition and would further suggest to optionally cache the requests to avoid unnecessary server load. I'd suggest such a minimally invasive implementation: #468

Also related to @jsoules comment #425 (review) since the requests cache for the unit test could be stored as an artifact, avoiding any traffic generated by CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants