Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure URL fragments to named anchors also validate #363

Merged
merged 2 commits into from
Sep 15, 2019

Conversation

jroper
Copy link
Member

@jroper jroper commented Sep 10, 2019

No description provided.

Copy link
Member

@pvlugter pvlugter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@pvlugter
Copy link
Member

The failure in the CI build is sbt/sbt#5049. Not sure why it's using different versions in scripted, but I assume that means that if sbt-paradox is published with sbt 1.3.0 it no longer works in sbt 1.2.8 projects?

@raboof
Copy link
Contributor

raboof commented Sep 10, 2019

sbt-paradox is published with sbt 1.3.0 it no longer works in sbt 1.2.8 projects

Looks like it. Do we think this is fine (then we should apply #364) or do we want to support 1.2.8 for a while (then we should roll back #356)? I think I'd be OK with the former.

@raboof
Copy link
Contributor

raboof commented Sep 10, 2019

Hmm, though that would also prevent projects like Akka gRPC, that themselves produce sbt plugins and might not be ready to require sbt 1.3.0, from upgrading. WDYT?

@jroper
Copy link
Member Author

jroper commented Sep 10, 2019

Can't we just change the version of sbt that we build against? There's no reason that sbt plugins need to be built against the same version of sbt that they use to build themselves. So we should just set sbtVersion to the lowest version of sbt that we're compatible with, eg 1.0.0 if we're compatible with that.

@raboof
Copy link
Contributor

raboof commented Sep 10, 2019

There's no reason that sbt plugins need to be built against the same version of sbt that they use to build themselves

You're right of course. Then I'd be fine with just sticking with sbt 1.3.0 for paradox and setting sbtVersion in any downstream projects that need to.

@pvlugter
Copy link
Member

Another fragment check that doesn't work is to the datadog docs:

https://docs.datadoghq.com/tracing/guide/trace_sampling_and_storage/?tab=java#trace-storage

Looks like the id attribute is not quoted in this case: <h2 id=trace-storage>Trace Storage</h2>

@pvlugter
Copy link
Member

pvlugter commented Sep 10, 2019

And single quotes of id/name attributes would also not work. Should the validation support unquoted, single-quoted, and double-quoted?

@jroper
Copy link
Member Author

jroper commented Sep 11, 2019

Hmm... so we can keep coming up with edge cases etc and adding support for them, or we could do it properly, by parsing the document and then looking for named anchors or ids. This would mean using HtmlUnit (I'm not aware of any other HTML parser available on the JVM - there are plenty of xml parsers of course, but many web pages, such as the datadog docs example, are not valid XML so can't be parsed by an XML parser). Thoughts?

@jroper
Copy link
Member Author

jroper commented Sep 11, 2019

I forgot about jsoup.

@pvlugter
Copy link
Member

Yes, my next thought as well. Switching over to jsoup sounds good to me.

@jroper
Copy link
Member Author

jroper commented Sep 11, 2019

Ok, I added jsoup. Also, I made a small improvement, links to the same page with different fragments only result in that page being downloaded once. Plus the code around grouping common links is simpler now.

Also, modified link validation so links to the same page with different
fragments only load/download that page once.
@pvlugter
Copy link
Member

Cool, I'll try it out again.

private def validateFragments(path: String, content: Document, fragments: List[CapturedLinkFragment], errorContext: ErrorContext): Unit = {
fragments.foreach {
case CapturedLinkFragment(Some(fragment), sources) =>
if (content.getElementById(fragment) == null && content.select(s"a[name=$fragment]").isEmpty) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure what the best way to lookup named a tags was. Could probably lookup all a tags, and then iterate through to see if any have a name of fragment. This is simpler, though with the possibility of escaping issues, for example, it definitely won't work for fragments with ] in them, there may also be other characters that will cause a problem. People don't usually put special characters that won't work in names since they're meant to appear in URLs, and URLs with percent encoding looks ugly, so it may just not be an issue. I think if we find any actual problems with this, we can switch to something that searches for it better.

Copy link
Member

@pvlugter pvlugter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and tried it on Cinnamon docs. There's one fragment that didn't work:

https://kafka.apache.org/documentation.html#producerconfigs

Looks like it's embedded in a handlebars script or something (didn't look closely). We'll just ignore or change that.

@jroper
Copy link
Member Author

jroper commented Sep 12, 2019

Yeah you're right, it is. To validate that we'd have to use htmlunit (which is likely to cause bigger problems because its javascript support is only partial) or selenium/webdriver (slow). I don't think it's worth it, I'd say just add an ignore (paradoxIgnorePaths).

@pvlugter pvlugter merged commit dc86f75 into lightbend:master Sep 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants