Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use the real column names rather than the IDs that astropy generates #3153

Merged
merged 2 commits into from
Dec 19, 2024

Conversation

ManonMarchand
Copy link
Member

@ManonMarchand ManonMarchand commented Dec 17, 2024

Hello astroquery maintainers 🙂

This PR fixes #3124

The issue was not in the votable parser in astropy, but in the conversion from votable to astropy table where for vizier the option "use_names_overs_ids" should be preferred, as vizier does not write IDs for its fields but uses names instead.

The IDs that we saw in the issue report are generated by the votable parser when a field has no ID.

See MRE for this:

from astropy.io.votable.tree import VOTableFile, Resource, TableElement, Field
votable = VOTableFile()
table = TableElement(votable)
resource = Resource()
votable.resources.append(resource)
resource.tables.append(table)
table.fields.extend([Field(votable, name="filename(myname", datatype="char", arraysize="*")])
WARNING: W03: ?:?:?: W03: Implicitly generating an ID from a name 'filename(myname' -> 'filename_myname' [astropy.io.votable.xmlutil]

So switching from ID to name for the names of the columns in the conversion from votable to astropy table did the trick.

This is a bit of a breaking change, as people will start to see the real column names rather than the underscore ones. It also appeared from the narrative docs that this strange option was adding some underscores at the beginning of some column names event when no unconventional characters are present.

Edit: the underscore at the beginning is added when the column name starts with a letter... something forbidden by the standard that vizier still does 😰 (and I don't think we can change this upstream)

Edit bis: no, starting with a number is valid. I was confused by the text of the standard, but from reading the xsd at the bottom, the name should be of type xsd:token, which is defined here: https://www.w3.org/TR/xmlschema11-2/#token
Looks like the xsd:token can be 2MASS without issues 🙂

Copy link

codecov bot commented Dec 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.54%. Comparing base (04c6069) to head (04fcc57).
Report is 13 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3153   +/-   ##
=======================================
  Coverage   67.54%   67.54%           
=======================================
  Files         233      233           
  Lines       18493    18493           
=======================================
  Hits        12491    12491           
  Misses       6002     6002           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@keflavich
Copy link
Contributor

@ManonMarchand you mean if the column started with a number, an underscore was added, right?

This PR looks good to me.

I think we should have a little discussion about the breaking nature of the PR - do we need to have an option to support the old column naming scheme? IMO, no, let's fix the bug and move on, but as a heavy user of Vizier, I have relied on its stability in many places in my own software, and a sudden column name change may require substantial refactoring.

Could you add a Note to the narrative documentation indicating what this change was, why it happened, and (roughly) when it did - i.e., astroquery 0.4.8 / late 2024? We don't do this in all cases, but given the subtlety of the change and the long-standing nature of the issue in a broadly-used package, it is warranted.

@ManonMarchand
Copy link
Member Author

ManonMarchand commented Dec 17, 2024

Yes, that's exactly what I mean.

Sure I can write a note. I'm also a bit worried about this breaking change (still a fix, though) but I'm also not in favor of an option to keep the old behavior.

@cosmoJFH
Copy link
Contributor

Just for the records, a similar change was done in the gaia module: #2967

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Adam is happy with getting this change in from the user's POV and think of it more as a bugfix than a behaviour change, then I'm also OK with merging it now.

@bsipocz bsipocz added the bug label Dec 17, 2024
docs/vizier/vizier.rst Outdated Show resolved Hide resolved
@keflavich
Copy link
Contributor

Yes, I'm happy with it - but give @ManonMarchand a chance to evaluate my suggested docs rephrasing before we merge.

@ManonMarchand ManonMarchand force-pushed the fix-use-names-over-ids branch 2 times, most recently from 59c7ac2 to e723aa2 Compare December 19, 2024 10:43
Co-authored-by: keflavich <adam.g.ginsburg@gmail.com>
@bsipocz bsipocz merged commit 3c216a5 into astropy:main Dec 19, 2024
11 checks passed
@bsipocz
Copy link
Member

bsipocz commented Dec 19, 2024

Thank you both!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vizier query for columns with underscore in name
4 participants