Java Servlets and URI Parameters

What’s a URI Parameter? well it’s not the values after the question mark in the URI below:

/some/path?key=value

those are HTML Form values. Actually URI Parameters are rarely used, and also poorly understood by servlet containers, here’s an example URI:

/some/path;param

To be pedantic (and this post is full of pedantry!) I should probably call them something like ‘URI Path Segment Parameters’, but for the sake of brevity I’ll continue to say URI Parameter for the rest of this document.

The HTTP 1.1 specification [1] never refers to URI parameters specifically but it does say:

   For definitive information on
   URL syntax and semantics, see "Uniform Resource Identifiers (URI):
   Generic Syntax and Semantics," RFC 2396 [42] (which replaces RFCs
   1738 [4] and RFC 1808 [11]). This specification adopts the
   definitions of "URI-reference", "absoluteURI", "relativeURI", "port",
   "host","abs_path", "rel_path", and "authority" from that
   specification.

RFC 2396 [2] has this to say about the abs_path definition:

abs_path         = "/"  path_segments
...
path_segments = segment *( "/" segment )
segment          = *pchar *( ";" param )
param             = *pchar

pchar              = unreserved | escaped |
                         ":" | "@" | "&" | "=" | "+" | "$" | ","

The path may consist of a sequence of path segments separated by a
single slash "/" character.  Within a path segment, the characters
"/", ";", "=", and "?" are reserved.  Each path segment may include a
sequence of parameters, indicated by the semicolon ";" character.
The parameters are not significant to the parsing of relative
references.

Note how parameters are permitted on each segment of the path.

As an aside RFC 2396 has been obsoleted by RFC 3986 [3], which seems to widen the definition of a URI parameter:

sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                    / "*" / "+" / "," / ";" / "="
segment     = *pchar
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
...
Aside from dot-segments in hierarchical paths, a path segment is
considered opaque by the generic syntax.  URI producing applications
often use the reserved characters allowed in a segment to delimit
scheme-specific or dereference-handler-specific subcomponents.  For
example, the semicolon (";") and equals ("=") reserved characters are
often used to delimit parameters and parameter values applicable to
that segment.  The comma (",") reserved character is often used for
similar purposes.  For example, one URI producer might use a segment
such as "name;v=1.1" to indicate a reference to version 1.1 of
"name", whereas another might use a segment such as "name,1.1" to
indicate the same.  Parameter types may be defined by scheme-specific
semantics, but in most cases the syntax of a parameter is specific to
the implementation of the URI's dereferencing algorithm.

So a parameter is now anything that follows a sub-delims character, not just a semi-colon. However since HTTP 1.1 depends on RFC 2396 I don’t think the above is directly relevant to this discussion, and I don’t expect containers to support this syntax.

What does the Servlet Specification [4] have to say about URI Parameters? Very little, if anything, just the following:

Path parameters that are part of a GET request (as defined by HTTP 1.1) are not
exposed by these APIs. They must be parsed from the String values returned by
the getRequestURI method or the getPathInfo method.

Now this statement raises a couple of questions:

  • What is a ‘path parameter’? HTTP 1.1 never uses this term, I’m inferring that its a URI parameter
  • Why does it state they only apply to GET requests?

So I’m unclear whether the above statement is meant to apply to URI Parameters or not. The one inference I will draw from it is that all data passed in the request URI should be retrievable from getRequestURI(), and the path portion (following the portion of the path mapped to the servlet path) of the request URI should be retrievable from getPathInfo().

The javadocs for these methods also imply this:

java.lang.String getRequestURI()

Returns the part of this request’s URL from the protocol name up to the query string in the first line of the HTTP request. The web container does not decode this String. For example:

java.lang.String getPathInfo()

Returns any extra path information associated with the URL the client sent when it made this request. The extra path information follows the servlet path but precedes the query string and will start with a “/” character.

This method returns null if there was no extra path information.

Same as the value of the CGI variable PATH_INFO.

Returns:
a String, decoded by the web container, specifying extra path information that comes after the servlet path but before the query string in the request URL; or null if the URL does not have any extra path information

Servlet containers don’t handle URI Parameters the way I expect

I wrote a little servlet (mapped to /*) to test how various servlet containers (just the ones I had to hand, there’s plenty more I didn’t test) handle URI Parameters. The servlet issues a temporary redirect to the following path and then displays the values returned from the HttpServletRequest interface:

a,b/c;d/e.f;g/h?i=j+k&l=m

This is the output I expected:

getServerInfo getPathInfo getQueryString getRequestURI getRequestURL
Container a,b/c;d/e.f;g/h i=j+k&l=m /servlet-uri-handling/a,b/c;d/e.f;g/h http://localhost:8080/servlet-uri-handling/a,b/c;d/e.f;g/h

Let’s take a look at the actual results…

getServerInfo getPathInfo getQueryString getRequestURI getRequestURL
Apache Tomcat/7.0.12 /a,b/c/e.f/h i=j+k&l=m /servlet-uri-handling/a,b/c;d/e.f;g/h http://localhost:8080/servlet-uri-handling/a,b/c;d/e.f;g/h
jetty/6.1.26 /a,b/c i=j+k&l=m /servlet-uri-handling/a,b/c;d/e.f;g/h http://localhost:8080/servlet-uri-handling/a,b/c;d/e.f;g/h
GlassFish Server Open Source Edition 3.0.1 /a,b/c i=j+k&l=m /servlet-uri-handling/a,b/c;d/e.f;g/h http://localhost:8080/servlet-uri-handling/a,b/c;d/e.f;g/h
WebLogic Server 10.3.4.0 /a,b/c i=j+k&l=m /servlet-uri-handling/a,b/c;d/e.f;g/h http://localhost:7001/servlet-uri-handling/a,b/c;d/e.f;g/h
Oracle Containers for J2EE 10g (10.1.3.5.0) /a,b/c i=j+k&l=m /servlet-uri-handling/a,b/c http://localhost:8888/servlet-uri-handling/a,b/c

Some observations:

  • As expected, none of the containers follow the wider spec of RFC 3986, treating any of the sub-delimiters as parameter markers, they only treat the semi-colon as the parameter marker.
  • All containers do recognize the presence of URI parameters
  • All containers discard any kind of URI parameter data when returning the value of getPathInfo(). Not sure why they do this, I can’t see anything in the Servlet specification telling them to do this.
  • Tomcat is the only container that understands that each segment of the path can have parameters. All the others assume the parameters can only appear on the last segment.
  • All containers bar OC4J will return the full path information (including context path, up to the query string) from getRequestURI().

Conclusion

If you need to handle URI parameters don’t use getPathInfo(), use getRequestURI() and parse out the information you need yourself, remember to URL decode if needed. If you’re tied to OC4J then you’re out of luck and there doesn’t seem to be a way to handle URI parameters.

References

Advertisements