Skip to content

HTTP URIs and Servlet API methods

Mark Thomas edited this page Oct 1, 2021 · 5 revisions

Introduction

The processing of HTTP URIs in a Servlet container is dependent on the order in which processing occurs. The following sections are presented in processing order.

Overview

From RFC 7230 we have:

http-URI = "http:" "//" authority path-abempty [ "?" query ] [ "#" fragment ]
https-URI = "https:" "//" authority path-abempty [ "?" query ] [ "#" fragment ]

with each of those elements defined by RFC 3986.

Fragments

Per RFC 7230, section 5.1 any [ "#" fragment ] is ignored as it is for client side processing only.

Query string

getQueryString() returns null for no query string and the empty string for an empty query string.

getRequestURI()

The return value of getRequestURI() is the URI at this point in the processing. i.e. any query string and fragment have been removed but the URI is otherwise unchanged.

Path parameters

See Path Parameters & RFCs

Neither RFC 7230 nor RFC 2616 makes any mention of path parameters. There is a reference to them in RFC 3986, section 3.3 but no formal definition. RFC 2396 has a slightly more formal definition of

segment       = *pchar *( ";" param )

It also states:

Each path segment may include a sequence of parameters, indicated by the semicolon ";" character. The parameters are not significant to the parsing of relative references.

Early versions of the Servlet spec referenced RFC 2396. Up to and including the current specification, URL rewriting using a path parameter is explicitly defined in the specification document as the lowest common denominator of session tracking.

There have been various security vulnerabilities reported related to path parameter handling, often path traversal attacks using some form of /..;/ where different Servlet containers and reverse proxy combinations handle this differently resulting in unexpected behaviour.

We need to explicitly define path parameter handling in the context of a Servlet container so that users of the API have a consistent experience and implementors of reverse proxies targeting Servlet containers are able to implement those reverse proxies with a clear understanding of how the container will behave.

Given the Servlet specifications original reliance on RFC 2396 and the text from that RFC regarding lack of significance with relative references, I would like to propose the following:

  • parse the URI to extract any session ID passed as a path parameter
  • ignore all other path parameters
  • if there is demand, and I don't think there is, we could implement issue #67 but I am currently leaning towards WONTFIX for that issue.

This means that path parameters would appear in getRequestURI() but not in getContextPath(), getServletPath() or getPathInfo().

An alternative would be:

  • Parse the URI to extract (and remove from the URI) any session ID passed as a path parameter.
  • Leave all other path parameters as is. Context paths and Servlet paths that included path parameters would not match (unless the context path or servlet mapping included the path parameter), resulting in 404s. Path parameters in the pathInfo would be included in the call to getPathInfo() and the app would need to parse them if required.
Option Strip out path parameters Retain path parameters
jsessionid Only appears in getRequestURI() Only appears in getRequestURI()
Application path parameters Parse from getRequestURI() Parse getRequestURI() or getContextPath()/getServletPath()/getPathInfo() as appropriate
Security concerns Potential problems with reverse proxies with segments like /..;/ No issues with reverse proxies as HTTP considers the segment (including path parameters) to be opaque
RFCs Not consistent with current RFCs for URI and HTTP Consistent with current RFCs for URI and HTTP
RFCs Because 3986 says any reserved character can be used to delimit a path parameter, removing all parameters could be tricky Not a concern as nothing needs to be removed
Backwards compatibility Would break any app that was parsing path parameters from anywhere other than getRequestURI() Would break any app that used path parameters but expected them not to be present in getContextPath(), getServletPath() or getPathInfo()
%nn decoding Simplifies as only %2f needs careful handling in path %nn encoding of any reserved character needs careful handling

Examples

URI getRequestURI() getContextPath() getServletPath() getPathInfo() getQueryString() Notes
"/context/servlet/path?a=b" "/context/servlet/path" "/context" "/servlet" "/path" "a=b" A simple case
"/context/servlet/path?a=b#fragment" "/context/servlet/path" "/context" "/servlet" "/path" "a=b" Fragments are ignored
"/context/servlet/path" "/context/servlet/path" "/context" "/servlet" "/path" null No query string
"/context/servlet/path?" "/context/servlet/path" "/context" "/servlet" "/path" "" Empty query string
"/context;c=d/servlet/path?a=b" "/context;c=d/servlet/path" "/context" "/servlet" "/path" "a=b" Assumes option 1 for path parameters (removed once the return value for getRequestURI() has been determined)