Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review UriCompliance.LEGACY behavior with bad UTF-8 in query #12791

Merged
merged 13 commits into from
Feb 18, 2025

Conversation

gregw
Copy link
Contributor

@gregw gregw commented Feb 13, 2025

Added tests for DEFAULT, ALLOW_BAD_UTF8 and LEGACY. LEGACY expectations are set to jetty-11 results

…extraction.

Added tests for DEFAULT, ALLOW_BAD_UTF8 and LEGACY.
LEGACY expectations are set to jetty-11 results
@gregw gregw requested a review from joakime February 13, 2025 14:28
gregw and others added 2 commits February 13, 2025 16:15
…extraction.

Added TRUNCATED_UTF8_ENCODING for LEGACY mode
* Left `queryBehaviorsLegacy` alone, as expectations from Jetty 11.
* The `queryBehaviorsDefault` expectations are based Jetty 12.0.16.
* The `queryBehaviorsBadUtf8Allowed` has its expectations updated.
@joakime joakime changed the title UriCompliance.Violation.ALLOW_BAD_UTF8 in LEGACY mode Review UriCompliance.LEGACY behavior with bad UTF-8 in query Feb 13, 2025
@joakime
Copy link
Contributor

joakime commented Feb 13, 2025

@gregw I pushed the test cases I used for Jetty 11 to the branch fix/11.0.x/query-behavior-testing

See tests here ...

https://github.com/jetty/jetty.project/blob/fix/11.0.x/query-behavior-testing/jetty-server/src/test/java/org/eclipse/jetty/server/RequestTest.java#L779

In the process I also added a few more test cases that came to mind around bad behaviors on the parameter names.

// raw unicode parameter names (strange replacement logic here)
cases.add(Arguments.of("€=currency", 200, "?", "currency"));
cases.add(Arguments.of("帽子=Beret", 200, "??", "Beret"));
// invalid pct-encoded parameter name
cases.add(Arguments.of("foo%xx=abc", 400, "foo�", ""));
cases.add(Arguments.of("foo%x=abc", 400, "foo�", ""));
cases.add(Arguments.of("foo%=abc", 400, "foo�", ""));

The replacement logic with raw unicode to ? (question mark character) is an odd discovery.

@joakime
Copy link
Contributor

joakime commented Feb 13, 2025

@gregw added another commit to that Jetty 11 branch.

Split up the DEFAULT vs LEGACY expectations.
And added some UTF-16 tests (as LEGACY reports that it supports UTF-16)

See commit bed552c

@gregw gregw marked this pull request as ready for review February 14, 2025 07:54
joakime
joakime previously approved these changes Feb 17, 2025
Copy link
Contributor

@joakime joakime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Just a comment niggle, and a question about case.
Otherwise, lets get this merged.

@gregw
Copy link
Contributor Author

gregw commented Feb 18, 2025

@joakime fixed niggle. Please re-review and feel free to merge if CI passes.

@joakime joakime merged commit 61f7736 into jetty-12.0.x Feb 18, 2025
10 checks passed
@joakime joakime deleted the fix/jetty-12/badUtf8Query branch February 18, 2025 21:06
@joakime
Copy link
Contributor

joakime commented Feb 18, 2025

Merged, all the way through to 12.1.x

@basil
Copy link
Contributor

basil commented Feb 20, 2025

@gregw @joakime A similar problem is that a URL such as /foo%5Cbar worked just fine in Jetty 10 but triggers a suspicious path character violation in Jetty 12, even with UriCompliance.LEGACY.

@joakime
Copy link
Contributor

joakime commented Feb 20, 2025

A similar problem is that a URL such as /foo%5Cbar worked just fine in Jetty 10 but triggers a suspicious path character violation in Jetty 12, even with UriCompliance.LEGACY.

@basil this pull-request is about the query handling, not path.
Please open a new issue about this.

@basil
Copy link
Contributor

basil commented Feb 20, 2025

this pull-request is about the query handling, not path.

Of course. That is why I wrote "a similar problem" rather than "the same problem."

Please open a new issue about this.

@joakime I agree that a new issue should be reported about this. But since I have already reported 4 regressions against Jetty 12 since last May, I would appreciate it if you or someone else could open the issue about this one.

@joakime
Copy link
Contributor

joakime commented Feb 20, 2025

Please open a new issue about this.

@joakime I agree that a new issue should be reported about this. But since I have already reported 4 regressions against Jetty 12 since last May, I would appreciate it if you or someone else could open the issue about this one.

There's not been an issue filed about this yet.

https://github.com/jetty/jetty.project/issues?q=is%3Aissue%20SUSPICIOUS_PATH_CHARACTERS

The %5C is a violation of both UriCompliance.Violation.SUSPICIOUS_PATH_CHARACTERS and the Servlet spec rules.

You'll need to allow both that violation, and the setting to bypass the servlet spec rules.

See https://github.com/jetty/jetty-examples/tree/12.0.x/embedded/ee10-servlet-ambiguous-paths

@basil
Copy link
Contributor

basil commented Feb 20, 2025

There's not been an issue filed about this yet.

@joakime Of course. But since you asked me to open a new issue about this, and since I have already opened 4 issues about Jetty 12 regressions in the recent past, I have asked you (or someone else) to file a ticket about this particular issue as a courtesy.

You'll need to allow both that violation, and the setting to bypass the servlet spec rules.

This worked fine in Jetty 9/10, so I do not see a reason why I should use a custom UriCompliance. The Javadoc for UriCompliance.LEGACY states that it should be compatible with Jetty 9 behavior as-is.

@joakime
Copy link
Contributor

joakime commented Feb 20, 2025

You'll need to allow both that violation, and the setting to bypass the servlet spec rules.

This worked fine in Jetty 9/10, so I do not see a reason why I should use a custom UriCompliance. The Javadoc for UriCompliance.LEGACY states that it should be compatible with Jetty 9 behavior as-is.

This is in the category of the many servlet hacks over the years to allow these kinds of bad / ambiguous / suspicious paths.

Using LEGACY also doesn't decode like in Jetty 9/10/11. (a broken servlet hack)

All of those hacks have been ruled as "not working, and will never work without breaking backward compatibility with the servlet API itself, stop with all of the hacks in all versions of Servlet and enforce the rules listed."

If you rely on EXACT behavior as Jetty 9/10/11 then you will have to stick with Jetty 9/10/11 which doesn't implement the servlet rules outlined, and contains all of the hacks throughout the Jetty servlet implementation to attempt to work around these issues.

Jetty 12 removed all of those servlet hacks, as they all lead to bugs, unintended consequences, and in some cases security issues.

The best UriCompliance.LEGACY on Jetty 12 could do is to just allow those characters through the HTTP parser, but the use of the Servlet API to access that information will produce different results than the same API in Jetty 9/10/11. And you will still need to declare in your webapp that you want to bypass those servlet rules.

@basil
Copy link
Contributor

basil commented Feb 20, 2025

@joakime UriCompliance.LEGACY.with("custom", UriCompliance.Violation.SUSPICIOUS_PATH_CHARACTERS) resolves this regression for me (without needing to use context.getServletHandler().setDecodeAmbiguousURIs(true), as you claimed above). But I see no reason why UriCompliance.LEGACY should not honor its Javadoc by adding UriCompliance.Violation.SUSPICIOUS_PATH_CHARACTERS by default. That would resolve the root cause of the regression and make the UriCompliance.LEGACY.with("custom", UriCompliance.Violation.SUSPICIOUS_PATH_CHARACTERS) workaround unnecessary.

@joakime
Copy link
Contributor

joakime commented Feb 20, 2025

But I see no reason why UriCompliance.LEGACY should not honor its Javadoc by adding UriCompliance.Violation.SUSPICIOUS_PATH_CHARACTERS by default. That would resolve the root cause of the regression and make the UriCompliance.LEGACY.with("custom", UriCompliance.Violation.SUSPICIOUS_PATH_CHARACTERS) workaround unnecessary.

The decoding part is why.

Input Path Jetty 9/10/11 Servlet API Jetty 12 Servlet API
/aa%5Cbb /aa%5Cbb /aa%5Cbb
/aa%00bb 400 response 400 response
/aa%5Fbb /aa_bb /aa%5Fbb
/aa%2Fbb /aa/bb /aa%2Fbb
/aa%5C2Fbb /aa%2Fbb /aa%5C2Fbb

For a proper LEGACY behavior, something new would have to be created.
Not just using using SUSPICIOUS_PATH_CHARACTERS.

@joakime
Copy link
Contributor

joakime commented Feb 20, 2025

I've opened #12809

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

3 participants