Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Override URL encoding when serializing results to HTML #87

Merged
merged 2 commits into from
Jun 24, 2021

Conversation

spassarop
Copy link
Collaborator

When serializing results to HTML, URLs are not being encoded when they are on HTML attributes. This could lead to mistakes when validating values. If URLs are HTML-encoded then they would work when rendered and potentially malicious payloads will fail after encoding.

A test case was added (for DOM and SAX) testing this problem.

@spassarop spassarop changed the base branch from main to 1.6.4 June 20, 2021 18:21
@spassarop spassarop merged commit 6a1c2f8 into nahsra:1.6.4 Jun 24, 2021
} catch (IOException e) {
logger.error("URI escaping failed for value: " + originalURI);
}
return "";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method supposed to ALWAYS return an empty string? Seems strange to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. On the HTMLSerializer, that function is supposed to returned the URL with escaped quotes and nothing else. I'm reusing printEscaped which does the encoding we want and internally "prints" the result in the XML that is being built. If I return a non-empty string, that string is also "printed" because the call to escapeURI is something like print(escapeURI()). I mentioned this on our emails, it may be explained better there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add all the necessary explanation to this code as comments so it stands alone? Otherwise future reviewers will have the same concerns.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a separate PR with the comment. I tried to push directly to 1.6.4 and had git issues, but I'm kinda in a hurry and wanted to have this solved fast.

@nikowitt
Copy link

nikowitt commented Sep 22, 2021

But doesn't this break links having a query string like

<a href="https://server.com?param=a&param=b">Link</a>
as the server might not know how to handle the query string like
param=a&amp;param=b?

@spassarop
Copy link
Collaborator Author

@nikowitt - Browsers know how to handle it. I've made a simple test:

  1. Use AntiSamy to process <a href="http://localhost:9000?a=1&b=2">Some Link</a>. The result was <a href="http://localhost:9000?a=1&amp;b=2" rel="nofollow">Some Link</a>.
  2. Created an HTML file with only that link.
  3. Started a Python server on port 9000.
  4. Opened the file and clicked on the link, on Chrome, Firefox and Edge.

All used the link correctly:
imagen

Furthermore, when inspecting the DOM and selecting "Edit as HTML", the original source is the same:
imagen

If inspecting without editing, the URL is seen without &amp;. So the server does not know about this encoding stuff at all. Always regarding HTML rendered by browsers of course.

@nikowitt
Copy link

nikowitt commented Sep 23, 2021

I see. I have done my (failing) tests by opening the URL directly, not by using a rendered link. Thanks for pointing this out, we will do some additional testing.

@davewichers
Copy link
Collaborator

davewichers commented Dec 31, 2021

Note: This pull request for AntiSamy 1.6.4, fixes the issue that has been assigned: CVE-2021-35043.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants