Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix annotation and namespace issue (continued) #44

Merged
merged 8 commits into from
Mar 28, 2024
Merged

Conversation

jombr
Copy link
Collaborator

@jombr jombr commented Mar 17, 2024

Continue the work of #19 to fix #16

Fixes #16, and a variety of other issues hit in processing my existing grammars to extract documentation

  • Support documentation (or any other annotations) on groups and references to a named define
  • use a: as the xmlns prefix for http://relaxng.org/ns/compatibility/annotations/1.0, but don't assume that the user has already defined it, and make sure to qualify the case where they defined namespace a=... to be something else too.
  • Follow the spec for whitespace handling in a:documentation
  • Fix combining |=, &= productions in the presence of annotations.
  • Fix handling of datatag and literals with possibly-namespaced type references.

Additional changes:

grammar level datatypeLibrary solution

  • a grammar-level datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" gets printed if xsd:* was ever referenced
  • Any datatypes != http://www.w3.org/2001/XMLSchema-datatypes written explicitly on <data> or <value>
  • Include grammar root assert removed (one that tries to verify the grammar level datatypes are the same)

Changes in addition to discussed solution

  • Capture datatypes from included grammar so the prefix can be looked up and included on explicit written custom datatypes

@djc
Copy link
Owner

djc commented Mar 17, 2024

Do you want to take over maintenance?

#41

@jombr
Copy link
Collaborator Author

jombr commented Mar 17, 2024

Do you want to take over maintenance?

#41

I'm willing to but there may be quite a bit with the code base, Continuous Integration, and Deployment to PiPy that I'm not familiar with. That and I started learning Python this year while only visiting relaxng off and on.

That being said I'm interested, enjoy learning, and the package seems pretty stable. If your comfortable with that and can be available for questions then I'll do it.

@djc
Copy link
Owner

djc commented Mar 18, 2024

I've invited you as a collaborator on this repository. Let's see how this works out? Since you're indicating that you're pretty new to things I'll withhold PyPI permissions for now -- feel free to ask questions.

@jombr
Copy link
Collaborator Author

jombr commented Mar 19, 2024

I've invited you as a collaborator on this repository. Let's see how this works out? Since you're indicating that you're pretty new to things I'll withhold PyPI permissions for now -- feel free to ask questions.

Alright! I have some reading to do.

Let me know if this PR works for you (If you think its aligned with what @puetzk had in mind).

If this PR checks out I'll probably re-base it after #45 lands.

puetzk and others added 8 commits March 19, 2024 07:54
If the user already has namespace a = 'something else', the code for
a:documentation (## comments) must include an xmlns:a='...' to
point it at http://relaxng.org/ns/compatibility/annotations/1.0
despite that redefinition

It should be an an error for an annotation element to use an undefined
namespace prefix (otherwise we produce invalid XML)

annotations.rnc should define the a: prefix it's using,
trang requires this to parse the file at all, and I can find no
justification in the compact syntax spec for pre-defining
namespace a = 'http://relaxng.org/ns/compatibility/annotations/1.0'.

But leave the default in anyway, since it should be harmless
now that it's overridden by any conflicting definition;
removing it would break backward-compatibility.
Parsing a parenthesized expression should still allow annotations in front
This seems to be a regression in 214fc52,
when the parens were lowered from particle it should have been 'primary',
not-annotated-primary, so the annotated-primary production could add to it

Serializing ref or parentRef may need to write child elements
if there are any annotations present
Also fix the strlit versions to emit correct rng serialization
* the <data> element does *not* contain the <value> element
  instead, <value type=>...</value> can carry type info
* add missing production for TOKEN strlit

This is partially a regression in 2a487b6
According to the grammar at https://www.oasis-open.org/committees/relax-ng/compact-20021121.html#nt-documentationLineContent
the content of documentation comments strips off all # and *one* space,
then takes the rest of the line

This allows indentation in documentation blocks to be preserved
ASSIGN may not be the first child of DEFINE if the annotated-primary
production prepended documentation or other annotations.

Search through the whole child list to see if there's an ASSIGN
specifying a combining mode.
Add additional tests to datatypes.rnc to exercise this,
and fix several that were incorrect (the built-in "string" and "token"
should be datatypeLibrary="", not substituted for xsd:string and xsd:token

the first type-reference seen (if any) is promoted to be the value of
<grammar datatypeLibrary=...> and thereafter omitted (it will be copied
from the ancestor during simplification)
since xsd is magically predefined:
  https://relaxng.org/compact-20021121.html

    In the initial environment used for the start symbol [...] and
    xsd is bound as a datatype prefix to http://www.w3.org/2001/XMLSchema-datatypes;

And cannot even be changed:

    It is an error if the value of datatypePrefix is xsd and
    the the value of the literal is not http://www.w3.org/2001/XMLSchema-datatypes.

A grammar-level datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
gets printed if xsd:* was ever referenced.
otherwise nothing is set for on `<grammar datatypeLibrary="...">`

Any other datatypes URI (that is not `http://www.w3.org/2001/XMLSchema-datatypes`)
is always written explicitly on `<data>` or `<value>`
 - thus doesn't care what's in its ancestors.

ROOT Assert just gets removed
 - Included grammars can only disagree about whether it needs to get printed
   (if either wanted to print it, it does),
   not about what value it can have
   (which might not be reconcilable)

Add ROOT's DATATYPES to self.typelibs
 - so we can get ns from datatype_library
   via custom datatype prefix in type_attrs

No longer take first typelib used for global default
 - `data` and `value` is always written explicitly (if not `xsd:*`)
   (So no global default is required)

Remove self.types since its no longer used (or needed)
@jombr jombr requested a review from djc March 19, 2024 12:56
@coveralls
Copy link

Coverage Status

coverage: 89.527% (+0.6%) from 88.963%
when pulling 12073d9 on josephbredahl:annotations
into 813bc86 on djc:main.

@djc
Copy link
Owner

djc commented Mar 20, 2024

Sorry, I won't have time to thoroughly review this PR -- happy to answer any specific questions you have, though!

@jombr
Copy link
Collaborator Author

jombr commented Mar 20, 2024

No worries, figured it could be rude if I pushed without asking for a review. I'll review it again before merging.

@jombr jombr removed the request for review from djc March 20, 2024 11:10
@djc
Copy link
Owner

djc commented Mar 20, 2024

Not rude, I'm considering you the primary maintainer from here on, though I'll still want to check before we publish to PyPI.

@jombr jombr merged commit 03dff10 into djc:main Mar 28, 2024
6 checks passed
@jombr jombr deleted the annotations branch March 28, 2024 04:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing of documented groups fails
4 participants