Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [fix #702] Validate content of <dc:language> #807

Closed
wants to merge 5 commits into from

Conversation

kalaspuffar
Copy link
Contributor

@kalaspuffar kalaspuffar commented Nov 15, 2017

With this implementation, I try to fix the issue of #702.

This current fix will generate an error if the language prefix is longer than 3 characters.
The specification says that you could have pre-registered language subtags or could use 4 letter codes but these are reserved for future use and currently not used to my knowledge.

If this test passes it will check if this language has a variant specified and if so it will generate a warning informing the user of this discrepancy.

I hope this fix will start discussion towards a complete solution.

@tofi86 tofi86 requested review from rdeltour and tofi86 November 21, 2017 21:50
@tofi86 tofi86 added bugfix type: improvement The issue suggests an improvement of an existing feature labels Nov 21, 2017
@tofi86
Copy link
Collaborator

tofi86 commented Nov 21, 2017

Hej Daniel, thanks for your contribution! I'll try to review on the weekend!
Best, Tobias

@tofi86
Copy link
Collaborator

tofi86 commented Nov 21, 2017

@kalaspuffar Daniel, is it this statement which makes the PullRequest a WorkInProgress at the moment?

This current fix will generate an error if the language prefix is longer than 3 characters.
The specification says that you could have pre-registered language subtags or could use 4 letter codes but these are reserved for future use and currently not used to my knowledge.

If so: @murata0204 and @mattgarrish as you've been involved in the discussion about the topic in the original issue, could you probably answer Daniel's question?

@tofi86 tofi86 changed the title [WIP] Solving issue #702 [WIP] [fix #702] Validate content of <dc:language> Nov 26, 2017
@tofi86 tofi86 added the status: needs review Needs to be reviewed by a team member before further processing label Nov 26, 2017
Copy link
Member

@rdeltour rdeltour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the proposed OPF_085 would raise false positives, in some rare but legit cases.

@@ -282,6 +282,8 @@ public Message getMessage(MessageId id)
map.put(MessageId.OPF_082, Severity.ERROR);
map.put(MessageId.OPF_083, Severity.ERROR);
map.put(MessageId.OPF_084, Severity.ERROR);
map.put(MessageId.OPF_085, Severity.WARNING);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think EpubCheck can issue a WARNING for language tag variants, as there are legitimate uses for the variant tag. See for instance W3C’s explainer on the language tags.

At best this can be a HINT, but then I'm not quite sure it's worth implementing if it's just informative?

{
Locale l = Locale.forLanguageTag(metadata.getValue());
if (l == null || l.getLanguage().length() > 3) {
report.message(MessageId.OPF_086, EPUBLocation.create(path), metadata.getValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.
A more thorough validation would be to use IANA’s language tag registry, but this is probably overkill?

@kalaspuffar
Copy link
Contributor Author

Hi @rdeltour

I've changed the warning for variants to info.

You mention that this check should be done in the schema already. Didn't see one of those checks and my implementation only use Locale.forLanguageTag from Java 1.7 to validate that you use a valid format.

And if you have a variant defined I will inform you about that too.

Maybe this could be done "easier" or more correct by a validation rule in the schema?

@rdeltour rdeltour removed the bugfix label Nov 12, 2018
@rdeltour rdeltour added status: blocked Blocked by another issue or situation status: wontfix The issue is rejected due to limitations (of scope or dev resources) and removed status: needs review Needs to be reviewed by a team member before further processing labels Feb 24, 2021
@rdeltour
Copy link
Member

Closing this PR for now. A more complete check (based on the IANA registry) may be implemented in the future, depending on the outcome of the spec discussion at w3c/epub-specs#1509

@rdeltour rdeltour closed this Feb 24, 2021
@rdeltour rdeltour removed the status: blocked Blocked by another issue or situation label Feb 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: wontfix The issue is rejected due to limitations (of scope or dev resources) type: improvement The issue suggests an improvement of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants