-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Override URL encoding when serializing results to HTML #87
Conversation
} catch (IOException e) { | ||
logger.error("URI escaping failed for value: " + originalURI); | ||
} | ||
return ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method supposed to ALWAYS return an empty string? Seems strange to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is. On the HTMLSerializer, that function is supposed to returned the URL with escaped quotes and nothing else. I'm reusing printEscaped
which does the encoding we want and internally "prints" the result in the XML that is being built. If I return a non-empty string, that string is also "printed" because the call to escapeURI
is something like print(escapeURI())
. I mentioned this on our emails, it may be explained better there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add all the necessary explanation to this code as comments so it stands alone? Otherwise future reviewers will have the same concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a separate PR with the comment. I tried to push directly to 1.6.4 and had git issues, but I'm kinda in a hurry and wanted to have this solved fast.
But doesn't this break links having a query string like
|
@nikowitt - Browsers know how to handle it. I've made a simple test:
Furthermore, when inspecting the DOM and selecting "Edit as HTML", the original source is the same: If inspecting without editing, the URL is seen without |
I see. I have done my (failing) tests by opening the URL directly, not by using a rendered link. Thanks for pointing this out, we will do some additional testing. |
Note: This pull request for AntiSamy 1.6.4, fixes the issue that has been assigned: CVE-2021-35043. |
When serializing results to HTML, URLs are not being encoded when they are on HTML attributes. This could lead to mistakes when validating values. If URLs are HTML-encoded then they would work when rendered and potentially malicious payloads will fail after encoding.
A test case was added (for DOM and SAX) testing this problem.