-
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul registry interface #289
Conversation
Codecov Report
@@ Coverage Diff @@
## main #289 +/- ##
==========================================
+ Coverage 76.18% 78.00% +1.81%
==========================================
Files 44 45 +1
Lines 5119 5423 +304
==========================================
+ Hits 3900 4230 +330
+ Misses 1219 1193 -26
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only comment on the pieces that would fix the currently present CI failures, and leave the actual content review for those who know more about the VO internals.
pyvo/registry/regtap.py
Outdated
@@ -314,25 +486,120 @@ def standard_id(self): | |||
""" | |||
return self.get("standard_id", decode=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the method for standard_id
be similar to access_url
, where it returns the first element of standard_ids
if it has one element, otherwise it fails?
An example code like:
services = pyvo.regsearch(servicetype='tap', keywords=['heasarc'])
services[0].describe()
fails in describe
because self.standard_id
is None
.
If def standard_id
is modified to something like access_url
:
@property
def standard_id(self):
"""
the IVOA standard identifier
"""
#return self.get("standard_id", decode=True)
standard_ids = list(set(self["standard_ids"]))
if len(standard_ids)==1:
return standard_ids[0]
else:
raise dalq.DALQueryError(
"No unique standard_id. Use get_service.")
and the call to standard_id
in describe
is changed to:
# stdid = self.get("standard_id", decode=True).lower()
stdid = self.standard_id
Things seem to work.
On Sat, Jan 15, 2022 at 06:20:51PM -0800, Abdu Zoghbi wrote:
An example code like:
```python
services = pyvo.regsearch(servicetype='tap', keywords=['heasarc'])
print(services[0].describe())
```
fails in `describe` because `self.standard_id` is `None`.
Aw, dang, I had forgotten about .describe(). Actually, that has
been broken for multi-capability resources before, too.
So, I've given it a bit of a facelift in commit 646d5f00. This also
includes an update to standard_id as you suggest, but that wouldn't
have been enough, as describe's attempt to read the access URL would
now fail for multi-capability resources.
I hope describe's output -- human-oriented as it is -- won't count
against the API and hence can be changed within reason without
breaking things. Except that of course it must still work for
humans. Does what there's now still work for you?
|
b32bd95
to
d858987
Compare
That works now. Thanks. |
self.stdids.add(SERVICE_TYPE_MAP[std]) | ||
elif "://" in std: | ||
self.stdids.add(std) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std
is checked only in the keys of SERVICE_TYPE_MAP
, not the values.
Previously, it checks both the keys and values.
So now, a search like:
services = vo.regsearch(servicetype='conesearch')
doesn't work, where it previously used to work. servicetype='scs'
works. I am not sure if this is the intended behavior. I think it will be useful to allow the search with both sets of words.
The function accepts query constraints either as Constraint objects | ||
passed in as positional arguments or as their associated keywords. | ||
For what constraints are available, see TODO. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list of constraints is a TODO. Opening a case here so we don't forget it.
On Mon, Jan 17, 2022 at 06:23:50PM -0800, Abdu Zoghbi wrote:
@zoghbi-a commented on this pull request.
+ passed in as positional arguments or as their associated keywords.
+ For what constraints are available, see TODO.
+
The list of constraints is a TODO. Opening a case here so we don't forget it.
Ah, right, I had forgot that. What I'd want to reference here is the
local equivalent of
http://docs.g-vo.org/temporary-pyvo/registry/index.html#basic-interface;
is there any sphinx buff here that can tell me what to write in the
docstring to have a robust link that works regardless of where the
rendering ends up in?
|
On Mon, Jan 17, 2022 at 06:11:15PM -0800, Abdu Zoghbi wrote:
@zoghbi-a commented on this pull request.
So now, a search like:
```python
services = vo.regsearch(servicetype='conesearch')
```
doesn't work, where it previously used to work. `servicetype='scs'`
Well, that clearly shouldn't be. I've added the missing keys.
works. I am not sure if this is the intended behavior. I think it
will be useful to allow the search with both keys and values.
That's not quite as simple, as the values, really, are the ivoids.
On the other hand, there's just a handful of such legacy terms, and
they're easy to add to SERVICE_TYPE_MAP. That's in commit 3d0b068.
Thanks for catching that!
|
Thanks. |
[r.short_name for r in self], | ||
[r.res_title for r in self], | ||
[r.res_description for r in self], | ||
[", ".join(sorted(r.access_modes())) for r in self]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the rational may be to make the returned table as simple as possible, and as a user, I may also be interested in the ivoid
in the returned table. Is it possible to add it?
On Tue, Jan 18, 2022 at 08:34:40AM -0800, Abdu Zoghbi wrote:
I see that the rationale may be to make the returned table as
simple as possible, and as a user, I may also be interested in the
`ivoid` in the returned table. Is it possible to add it?
Hm -- there is a slight semantic problem here, as the table in
general will not have an ivoid. Tables are attached to registry
records (which, strictly, are what ivoids reference), and there may
be multiple tables on a single record (and multiple records may host
the same table, but that's probably less of a problem here).
So, I think before we rack our heads about how we can attach some
provenance-like info on the table, we ought to be clear on what
workflow (or whatever) we want to enable by attaching it. Do you
perhaps have something like a user story where such an information
might be relevant? Or more simply: What made you miss the ivoid?
|
My understanding is that this function is to meant to make it easy to summarize the content of the registry records, so I was thinking that adding ivoid provides more information in this summary. My use case here is that some of the datasets may be available from multiple providers (chandra data in mind here), so by including the ivoid, it helps quickly see the source of the data. |
On Wed, Jan 19, 2022 at 07:20:50AM -0800, Abdu Zoghbi wrote:
> information might be relevant? Or more simply: What made you miss
> the ivoid?
My use case here is that some of the datasets may be available from
multiple providers (chandra data in mind here), so by including the
ivoid, it helps quickly see the source of the data.
What about adding a reference to the originating RegistryRecord
instance as a weakref in an origin attribute?
|
I don't see how a reference helps, because one can access the RegistryRecord directory if needed. I was more looking for an easy way to access or display the content without looping through the individual records. Also, to backup a little, my suggestions stem from the fact that such feature (ivoid in the table) has been used in the navo-pyvo workshops that are conducted at the AAS. As an alternative, would it be possible for example if |
"reference_url", | ||
"creator_seq", | ||
"content_type", | ||
"source_format", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't source_value
be here too? (or maybe in to_table
). I am not sure about their exact meaning, but it is another thing that breaks few cells in the tutorial notebooks.
On Fri, Jan 21, 2022 at 05:40:25AM -0800, Abdu Zoghbi wrote:
I don't see how a reference helps, because one can access the
RegistryRecord directory if needed. I was more looking for an easy
way to access or display the content without looping through the
individual records.
Hm -- do you have some sample code showing what you're trying to do?
Be that as it may, I've tentatively added an .origin attribute to
these VOSI tables in commit b111e8b. This seems a reasonable thing
to do anyway, and it's basically free. I'm happy to pull it again if
it doesn't help.
|
On Sat, Jan 22, 2022 at 07:21:23AM -0800, Abdu Zoghbi wrote:
+ "content_type",
+ "source_format",
Shouldn't `source_value` be here too? (or maybe in `to_table`). I
am not sure about their exact meaning, but it is another thing that
breaks few cells in the tutorial notebooks.
Good catch. Not sure how this got lost, but it should of course be
in there (it says, basically, what you ought to cite if you use the
data).
I'm adding it in commit 1b4633a, which also adds a unit test making
sure it won't get lost again.
|
This addresses review comment astropy#289 (review)
This points back to the parent instance. This is an attempt to address astropy#289 (comment)
An example code may look something like the following: uv_services = vo.regsearch(servicetype='image',keywords='galex', waveband='uv')
uv_services.to_table()['ivoid','short_name'] The second line fails because The solution may be to loop through the records with something like: for s in uv_services:
print(f'{s.ivoid} {s.short_name}') but my understanding is that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are additional small suggestions that I came across testing the code as a user who doesn't know the deep details of the vo.
resource. | ||
* :py:class:`pyvo.registry.Servicetype` (``servicetype``): constrain to | ||
one of tap, ssa, sia, conesearch. This is the constraint you want | ||
to use for service discovery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or their local parts of the ivoid (ssap, image, scs)
pyvo/registry/rtcons.py
Outdated
|
||
>>> registry.Spatial([23, -40, 26, -39, 25, -43]) | ||
|
||
To find resources claiming to cover a MOC, pass an ASCII MOC:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a link to the MOC document. Is this the one? https://www.ivoa.net/documents/MOC/
>>> registry.Spatial(SkyCoord("23d +3d")) | ||
|
||
SkyCoords also work as circle centers:: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... where the radius is in degrees.
A spectral point or interval to cover. This must be a wavelength, | ||
a frequency, or an energy, or a pair of such quantities, | ||
in which case the argument is interpreted as an interval. | ||
All resources *overlapping* the interval are returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code seems to suggest that a float
or int
is assumed to be in joules. Maybe worth adding a sentence on that here.
webbrowser.open(self.access_url, 2) | ||
|
||
|
||
class Interface: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a __repr__
method to Interface
?
The example in mind is an interactive session with something like:
services[0].interfaces[0].to_service()
prints: <pyvo.dal.ssa.SSAService at 0x...>
, so I know this is an SSA service, but this:
services[0].interfaces[0]
prints: <pyvo.registry.regtap.Interface at 0x...>
, which is somewhat obscure.
On Tue, Jan 25, 2022 at 07:10:37PM -0800, Abdu Zoghbi wrote:
+* :py:class:`pyvo.registry.Servicetype` (``servicetype``): constrain to
+ one of tap, ssa, sia, conesearch. This is the constraint you want
+ to use for service discovery.
... or their local parts of the ivoid (ssap, image, scs)
Well, these I'd consider backward compatibility hacks, that I'd not
mention in the interest of nudging people to exactly one way to do a
given thing; I have added language to indicate that you can pass in
full ivoids for standards we don't have short handles for, though, as
that does add functionality (though I'd guess for fairly esoteric
things for now).
+ To find resources claiming to cover a MOC, pass an ASCII MOC::
maybe add a link to the MOC document. Is this the one? https://www.ivoa.net/documents/MOC/
Did this, although I'm generally not a big fan of pointing people to
IVOA standards, which tend to be not terribly friendly to end users.
This one is rather readable, though.
I suppose we ought to support pymoc instances here, too, but I'm kind
of hoping pymoc (or something like that) will enter astropy or pyvo
soon-ish; as long as there's no "canonical" moc library, I'd rather
wait.
+ SkyCoords also work as circle centers::
... where the radius is in degrees.
Indicated as much.
+ spec : astropy.Quantity or a 2-tuple of astropy.Quantity-s
+ A spectral point or interval to cover. This must be a wavelength,
+ a frequency, or an energy, or a pair of such quantities,
+ in which case the argument is interpreted as an interval.
+ All resources *overlapping* the interval are returned.
The code seems to suggest that a `float` or `int` is assumed to be
in joules. Maybe worth adding a sentence on that here.
Right.
How about adding a `__repr__` method to `Interface`. The example in
mind is an interactive session with something like:
Fair enough. I've written a __repr__ according to the venerable rule
that if possible, __repr__ should give a literal that's fine for
re-constructing the instance, which is rather simple in this case.
The respective changes are in commit 5a6de54.
|
Also, adding a __repr__ method to regtap.Interface that returns a constructor call literal. This is in response to astropy#289 (review)
On Tue, Jan 25, 2022 at 02:45:42PM -0800, Abdu Zoghbi wrote:
```python
uv_services = vo.regsearch(servicetype='image',keywords='galex', waveband='uv')
uv_services.to_table()['ivoid','short_name']
```
The second line fails because `ivoid` is not included in the table, unlike the previous version.
Ah, I see; I had misunderstood you before, sorry.
Well... I'm not so happy about that, as ivoids are large, unbreakable
and ugly, and my idea was that to_table should give a quick and
halfway readable overview of what's there.
But then I hadn't considered that to_table had worked before by
virtue of RegistryResults inheriting from DALResults. So, the
current thing does indeed break code, and that's serious.
Given that, I'm tempted to (a) call current to_table something else,
perhaps "show" or "overview" and thus un-hide previous to_table. The
advantage here would be that RegistryResults.to_table behaves as
expectable from DALResults. The disadvantage is that I'd consider
the output essentially unreadable in interactive applications.
The alternative would be to (b) just paste the ivoid column to the
right end of the current table. The advantage would be that
to_table's return value is still halfway readable. The disadvantage
would be that if someone wants to do some magic on the full result
table, that wouldn't work.
I'm leaning towards (a). Opinions?
|
Apologies. I may have not explained the issue properly. |
On Wed, Jan 26, 2022 at 01:32:29PM -0800, Abdu Zoghbi wrote:
Apologies. I may have not explained the issue properly.
I, too, like solution (a).
Ok, the compact result display is now get_summary (commit 0fea107).
|
This is because overwriting DALResults.to_table proved a bad idea (see astropy#289 (comment))
The messenger_vocabulary fixture should have made them non-remote, but I've never actually tested that. It now turns out that astropy.data doesn't use requests and hence my nice requests mocker just didn't work. I've now replaced it by code seeding astropy.data's cache. This is a pattern we ought to use whenever code uses IVOA vocabularies. I wonder where we should document this?
On Wed, Jun 01, 2022 at 12:48:16PM -0700, Brigitta Sipőcz wrote:
However, I'm not certain that they should, it maybe a genuine bug.
Other than this, the PR should be ready to go if CI is all green.
That was a bug resulting from my mistaken assumption that
astropy.utils.data uses requests. Which it doesn't.
Commit cc2967c (which I'll push
later together with my attempts with fixing the doctests) ought to
fix this.
|
On Wed, Jun 01, 2022 at 12:23:03PM -0700, Brigitta Sipőcz wrote:
As we discussed with @tomdonaldson, many of the examples in the
docs are actually producing various errors. Some of them seem
Good thing these are now tested, thanks for setting this up. I hope
I've fixed whatever is fixable in commit
6dc5dd7. Two things, though:
(1) the example starting with ``tables = resources[4].get_tables()``
fails, and unless I misunderstand pytest's output, that's because it
throws warnings unrelated to pyVO (I've talked to VizieR about them).
Before I start digging: What's the idiomatic way to shut these
warnings up in doctests like these?
(2) there are two examples for "all-VO" querying. I suggest to
keep them switched off for testing/CI -- they hit a lot of services,
several of which will always be down, so it's unlikely they'll ever
pass. Sure: One could restrict that to a subset of known-reliable
services, but even then all that is a really expensive thing that
doesn't actually exercise registry but the DAL services on the end of
the providers, so that doesn't belong here.
On the other hand, I'd really like to show how this works in the docs
in plain code, so I'd hate to take the examples out, too.
|
@msdemlei , based on the current test failures, is it possible that the new |
For warning that are totally unrelated and we shouldn't do anything about them, you can use the following at the end of each line producing the warning: |
The issue with those that the current timeout is not doctest related, they would timeout on my python terminal if I were a user. |
The issue with untested examples is that over time they can, and most certainly will go out of sync with the code, so we should have them sparingly. |
|
||
This will ignore VOSI (infrastructure) services. | ||
""" | ||
return set(shorten_stdid(intf.standard_id) or "web" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(sorry; also, I'm busy elsewhere, so I'll address the remaining issues on Monday)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an attempt to address #289 (comment)
On Thu, Jun 02, 2022 at 09:51:15AM -0700, Brigitta Sipőcz wrote:
> On the other hand, I'd really like to show how this works in the docs
in plain code, so I'd hate to take the examples out, too.
The issue with untested examples is that over time they can, and
most certainly will go out of sync with the code, so we should have
them sparingly.
I agree on that in principle, but excluding anything that interacts
with the whole VO from regular testing is, I think, on the good side
of "sparing".
I *could* be in on thinking about adding proper, pytest-based
integration tests using mock infrastructure with pytest and somehow
linking them to the documentation ("if you change this, make sure you
also change that"). On the other hand, mocking a significant part of
the VO looks like a lot of work...
|
On Thu, Jun 02, 2022 at 09:49:04AM -0700, Brigitta Sipőcz wrote:
The issue with those that the current timeout is not doctest
related, they would timeout on my python terminal if I were a user.
So maybe summarise what you said above in a short sentence in the
docs, that the command is expected to fail as it hits so many
services, (and then skip the testing).
Well, explaining everything related to it would take a mouthful. On
the other hand, all-VO querying is of course an important use case
for why we have the Registry in the first place and why it's cool to
be able to script things.
So, I've made a very short appendix to the registry docs giving
sample code that catches the most common problems in PR#342.
There is the related question of the TAP timeouts that people will
see when they run more interesting obscore queries; we *could* add
them here, but I'd say that's a topic too far from registry as such.
|
The issue with mocking is that it may not pick up all the real issues. E.g these examples timeouted for me, suggesting that maybe it's enough to mention or show to users how to work around it. |
…#342) This is an attempt to address #289 (comment)
This PR refactors the Registry interface to enable advanced, data discovery-like discovery queries interesting in particular in interactive (notebook) usage; the RegTAP queries are now constructed under the hood by Constraint objects. The legacy constraint specification using keyword arguments is translated to the new scheme transparently (or so I hope).
This also adds constraints for space, time, and spectrum using features from the draft for RegTAP 1.2.
A bit more on the rationale for this is discussed on https://blog.g-vo.org/towards-data-discovery-in-pyvo.html, where you will also find a jupyter notebook showcasing some of the new features.
This is intended to address bugs #278, #238, and #176.