Why trailing slashes on URIs are important

Origin of the trailing slash

In Unix, a trailing slash on a pathname identifies the path as pointing to a folder (aka directory). If a pathname does not have a trailing slash then it points to a file. A folder is a ‘collection’ of files.

The syntax of URIs is derived from the syntax of Unix filenames, and the concept of using trailing slashes to identify ‘collection’ resources was carried over. However on the Web, the strong delineation between folders and files does not exist, frequently a ‘collection’ resource appears similar in structure and content to a normal resource (sometimes referred to as a ‘subordinate’ resource).

As a consequence, much confusion has arisen about the purpose and importance of trailing slashes on collection resources. It is common for users to forget the trailing slash on a resource, and common for web-servers to assist users, when they make this mistake, to redirect them to the URI with the trailing slash automatically.

In fact this practice is so pervasive, that for the vast majority of users, a resource URI with or without a trailing slash is treated as a synonym. They are considered two URIs that point to the same resource, using either one is fine. However this understanding is not quite correct.

It is more correct to understand that the resource without the trailing slash does not exist at all. But instead of being unhelpful and reporting a 404 Not Found status, web-servers almost always apply Postel’s Law and tell the user where the resource they are looking for is actually located, via a permanent redirect.

Read on to learn why it is important to have the correct understanding when designing RESTful APIs.

Relative URIs

The main reason trailing slashes are so important, is that they are critical to relative URIs in your API functioning correctly. There are many reasons why you should want to use relative URIs, but let’s look at just a couple:

  • If your API is available over both HTTP and HTTPS, then which protocol should you specify in your fully qualified URI? You’ll need to make sure all the links in a generated response match the protocol used to request the resource, other wise problems with mixed content may be encountered.
  • In a multi-tiered architecture, where the application server generating a resource may be at some remove from the public endpoint that the consumer of an API accesses, determining the correct public URI to generate in a resource can become cumbersome. For example if you have a Java Servlet running on Apache Tomcat, fronted by Apache HTTPD and reverse proxy (mod_proxy directives), it takes a good bit of configuration for the Servlet to be able to determine the public request URL, and even then the Servlet needs to be aware that it is being fronted by mod_proxy, which is not good for encapsulation.
  • If for any reason the location at which your API resides changes (web-site re-branding, moving API to a distinct hostname for scaling/security reasons, etc. etc.), then all of the resources exposed by the API must be updated to use the new fully qualified URIs.
  • If your API does not allow for relative URIs properly, then it will, in turn, prevent users of your API using relative URIs in any content they store in your service. More on this below.

Everywhere the fully qualified URI is repeated, it must be changed if the URI ever needs to change. With relative URIs this problem is mitigated, the location of a resource is expressed relative to the location of the current resource, and the client is able to turn that relative location into an absolute location using URI resolution.

To be blunt, using fully qualified URIs, when a relative URI would suffice, is a violation of the Don’t Repeat Yourself (DRY) principle. By needlessly repeating the fully qualified URI, you create more work for yourself, if the fully qualified URI has to change for any reason. You can also create more work for the customers of your API, by potentially preventing them using relative URIs.

Doing it wrong

Let’s look at an example that neglects to use trailing slashes and attempts to use relative URIs, and as a consequence gets things wrong.

Assume we’ve noticed that there just aren’t enough blog engines in the world, and so we’ve created the greatest blog engine ever (GBEE), and we want to expose an API to it, so that the people who make those ‘sharing’ widgets have yet another API that they need to integrate with.

Here’s a rough idea of the API:

GET https://{blog-name}.gbee.io/blog/posts       # Retrieve the list of posts
GET https://{blog-name}.gbee.io/blog/posts/{id}  # Retrieve an individual post
GET https://{blog-name}.gbee.io/blog/images/{id} # Retrieve an image linked to in a post's content.

Here’s a sample of retrieving the list of blog posts:

GET /blog/posts HTTP/1.1
Host: some-blog.gbee.io

produces a listing like the following:

HTTP/1.1 200 OK
Content-Type: application/json

{
 "posts": [
  {
   "links": [{"href":"holiday"}],
   "tags": ["vacation", "mexico"],
   "summary": "
Had a great trip to Mexico recently,
here's a picture of where we stayed:
<img src="../images/hotel.jpg">
"
  },
  ...
 ]
}

We can see that the relative URI of the first blog post is holiday, so what is the correct absolute URI of the blog post?

You might expect it to be http://some-blog.gbee.io/blog/posts/holiday, but that’s not what the document above says. It actually says the absolute location is http://some-blog.gbee.io/blog/holiday. To understand why, you need to understand the algorithm for transforming relative URIs into absolute URIs.

The first step is to establish the base URI of a resource, in this case the base URI is the URI of the requested resource, i.e.: http://some-blog.gbee.io/blog/posts

The next step is to merge the base URI with the relative URI, RFC 3986 describes this process, the relevant statement is:

return a string consisting of the reference’s path component appended to all but the last segment of the base URI’s path (i.e., excluding any characters after the right-most “/” in the base URI path, or excluding the entire base URI path if it does not contain any “/” characters).

Therefore we must exclude any characters after the right-most “/” in http://some-blog.gbee.io/blogs/posts, which gives: http://some-blog.gbee.io/blogs/. Finally the relative URI (holiday) is appended to this URI, thus giving http://some-blog.gbee.io/blog/holiday.

If a client attempts to retrieve http://some-blog.gbee.io/blog/holiday they will get a 404 Not Found error, because the resource doesn’t actually exist at that location.

If it is not already clear, placing a trailing slash on a collection resource is not optional. It is critical to relative URIs being resolved correctly. RFC 3986 is one of the foundational specifications of the web and as the example above demonstrates, collection resources are expected to have a trailing slash. It’s not just a stylistic preference, or something that provides an SEO optimization. It’s intrinsic to the syntax of URIs and therefore, important to get right.

I’d speculate that a lack of understanding of how relative URIs are transformed into absolute URIs, is a major factor in the all the too common occurrence, of web APIs not naming collection resources correctly, and consequently using fully qualified URIs throughout the API unnecessarily. I think developers encounter problems trying to get relative URIs working properly (through their lack of understanding of the mechanics) and then err on the side of caution and switch to using fully qualified URIs throughout.

The above ‘broken’ API also provides a commonly seen problem where relative URIs seem to be working in one case, but not in another. If we were to retrieve content of the blog post it would look like this:

GET /blog/posts/holiday HTTP/1.1
Host: some-blog.gbee.io
HTTP/1.1 200 OK
Content-Type: text/html

<p>Had a great trip to Mexico recently,
here's a picture of where we stayed:
<img src="../images/hotel.jpg">
</p>

Nothing unusual about this content, just regular HTML, and the relative URI of the image link looks correct:

Base URI
http://some-blog.gbee.io/blog/posts/holiday
Relative URI
../images/hotel.jpg
Absolute URI
http://some-blog.gbee.io/blog/images/hotel.jpg

So the problem doesn’t lie here, again it lies with the /blog/posts resource. When the <img> tag is evaluated relative to /blog/posts, the wrong location is produced:

Base URI
http://some-blog.gbee.io/blog/posts
Relative URI
../images/hotel.jpg
Absolute URI
http://some-blog.gbee.io/images/hotel.jpg

You can easily imagine many developers scratching their heads trying to figure out why the image location is not being calculated correctly for the /blog/posts resource, when it is working fine for the /blog/posts/holiday resource. The fact that the base URI is that of the requesting document (the list of posts), not of the blog post itself is easily missed.

Preventing API users doing things right

It is also worth appreciating that the author of the blog post may have the reasonable expectation that they can use relative URIs, since they are an intrinsic part of the Web. Users will expect that the blog engine will fully support the use of relative URIs. By choosing not to put a trailing slash on the /blogs/posts resource, the blogging engine has failed to meet this expectation. The post author can reasonably view the blog engine as defective in this regard.

Doing it right

All of the problems outlined above can be addressed by placing a trailing slash at the end of the blog posts resource, so it’s URI becomes:

http://some-blog.gbee.io/blog/posts/

Now the relative path for the holiday relative URI resolves correctly:

Base URI
http://some-blog.gbee.io/blog/posts/
Relative URI
holiday
Absolute URI
http://some-blog.gbee.io/blog/posts/holiday

The relative path for the image in the blog post also resolves correctly when resolved relative to the /blog/posts/ resource:

Base URI
http://some-blog.gbee.io/blog/posts/
Relative URI
../images/hotel.jpg
Absolute URI
http://some-blog.gbee.io/blog/posts/holiday

Getting Query URIs right

Another mistake that I have sometimes seen, is to get naming of a collection resource correct, but to get the URI for queries on the collection wrong, e.g.:

http://some-blog.gbee.io/blog/posts/              # retrieve all posts
http://some-blog.gbee.io/blog/posts?tags=vacation # retrieve posts tagged with 'vacation'

The problem here, once again, is that relative URIs returned in the http://some-blog.gbee.io/blog/posts?tags=vacation resource will be resolved relative to http://some-blog.gbee.io/blog rather than http://some-blog.gbee.io/blog/posts/, because the trailing slash is missing.

The correct form of the URI would be:

http://some-blog.gbee.io/blog/posts/?tags=vacation

The HTML <base> element

Some resource formats specify ways to override the base URI of a resource. For example HTML has the <base> element. XML has the xml:base extension.

These mechanisms exist to provide a way for a HTML or XML document to render correctly when using embedded hyper-links that use relative URIs, regardless of whether the hosting document is correctly named with a trailing slash or not (or if the URI of the hosting document cannot be determined).

I only mention these for completeness, I would not recommend their use unless required to workaround an existing API that does not name collection resources correctly.

Rules of Thumb

URIs are hierarchial in nature. The path component of a URI is particularly hierarchal. Levels in the hierarchy are delimited by the slash (“/“) character. Paths to the right of a slash are subordinate to paths to the left of a slash:

/a/b # b is subordinate to a
/c/d/e # e is subordinate to d, d is subordinate to c

Any path which occurs to the left of a slash is a collection resource, any path which occurs to the right is a subordinate. Note that a resource can be both a collection resource and a subordinate resource (as shown by the /c/d/e example, d is both a collection resource and a subordinate resource).

If a resource is a collection resource (even if it is subordinate to another resource), then it’s URI must have a trailing slash. If you visualize the path hierarchy of your API as a tree, then only the leaf nodes in the tree should lack a trailing slash.

  • Only leaf resources (resources with no subordinates) should lack a trailing slash in their URI.
  • If a request is received without a trailing slash, then do a permanent redirect to the URI with the trailing slash.
  • Use relative paths wherever possible (DRY principle).
  • Remember to include the trailing slash in query URIs.
  • Don’t do anything that would prevent consumers of your API using relative URIs.
Advertisements

Java Servlets and URI Parameters

What’s a URI Parameter? well it’s not the values after the question mark in the URI below:

/some/path?key=value

those are HTML Form values. Actually URI Parameters are rarely used, and also poorly understood by servlet containers, here’s an example URI:

/some/path;param

To be pedantic (and this post is full of pedantry!) I should probably call them something like ‘URI Path Segment Parameters’, but for the sake of brevity I’ll continue to say URI Parameter for the rest of this document.

The HTTP 1.1 specification [1] never refers to URI parameters specifically but it does say:

   For definitive information on
   URL syntax and semantics, see "Uniform Resource Identifiers (URI):
   Generic Syntax and Semantics," RFC 2396 [42] (which replaces RFCs
   1738 [4] and RFC 1808 [11]). This specification adopts the
   definitions of "URI-reference", "absoluteURI", "relativeURI", "port",
   "host","abs_path", "rel_path", and "authority" from that
   specification.

RFC 2396 [2] has this to say about the abs_path definition:

abs_path         = "/"  path_segments
...
path_segments = segment *( "/" segment )
segment          = *pchar *( ";" param )
param             = *pchar

pchar              = unreserved | escaped |
                         ":" | "@" | "&amp;" | "=" | "+" | "$" | ","

The path may consist of a sequence of path segments separated by a
single slash "/" character.  Within a path segment, the characters
"/", ";", "=", and "?" are reserved.  Each path segment may include a
sequence of parameters, indicated by the semicolon ";" character.
The parameters are not significant to the parsing of relative
references.

Note how parameters are permitted on each segment of the path.

As an aside RFC 2396 has been obsoleted by RFC 3986 [3], which seems to widen the definition of a URI parameter:

sub-delims  = "!" / "$" / "&amp;" / "'" / "(" / ")"
                    / "*" / "+" / "," / ";" / "="
segment     = *pchar
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
...
Aside from dot-segments in hierarchical paths, a path segment is
considered opaque by the generic syntax.  URI producing applications
often use the reserved characters allowed in a segment to delimit
scheme-specific or dereference-handler-specific subcomponents.  For
example, the semicolon (";") and equals ("=") reserved characters are
often used to delimit parameters and parameter values applicable to
that segment.  The comma (",") reserved character is often used for
similar purposes.  For example, one URI producer might use a segment
such as "name;v=1.1" to indicate a reference to version 1.1 of
"name", whereas another might use a segment such as "name,1.1" to
indicate the same.  Parameter types may be defined by scheme-specific
semantics, but in most cases the syntax of a parameter is specific to
the implementation of the URI's dereferencing algorithm.

So a parameter is now anything that follows a sub-delims character, not just a semi-colon. However since HTTP 1.1 depends on RFC 2396 I don’t think the above is directly relevant to this discussion, and I don’t expect containers to support this syntax.

What does the Servlet Specification [4] have to say about URI Parameters? Very little, if anything, just the following:

Path parameters that are part of a GET request (as defined by HTTP 1.1) are not
exposed by these APIs. They must be parsed from the String values returned by
the getRequestURI method or the getPathInfo method.

Now this statement raises a couple of questions:

  • What is a ‘path parameter’? HTTP 1.1 never uses this term, I’m inferring that its a URI parameter
  • Why does it state they only apply to GET requests?

So I’m unclear whether the above statement is meant to apply to URI Parameters or not. The one inference I will draw from it is that all data passed in the request URI should be retrievable from getRequestURI(), and the path portion (following the portion of the path mapped to the servlet path) of the request URI should be retrievable from getPathInfo().

The javadocs for these methods also imply this:

java.lang.String getRequestURI()

Returns the part of this request’s URL from the protocol name up to the query string in the first line of the HTTP request. The web container does not decode this String. For example:

java.lang.String getPathInfo()

Returns any extra path information associated with the URL the client sent when it made this request. The extra path information follows the servlet path but precedes the query string and will start with a “/” character.

This method returns null if there was no extra path information.

Same as the value of the CGI variable PATH_INFO.

Returns:
a String, decoded by the web container, specifying extra path information that comes after the servlet path but before the query string in the request URL; or null if the URL does not have any extra path information

Servlet containers don’t handle URI Parameters the way I expect

I wrote a little servlet (mapped to /*) to test how various servlet containers (just the ones I had to hand, there’s plenty more I didn’t test) handle URI Parameters. The servlet issues a temporary redirect to the following path and then displays the values returned from the HttpServletRequest interface:

a,b/c;d/e.f;g/h?i=j+k&l=m

This is the output I expected:

getServerInfo getPathInfo getQueryString getRequestURI getRequestURL
Container a,b/c;d/e.f;g/h i=j+k&l=m /servlet-uri-handling/a,b/c;d/e.f;g/h http://localhost:8080/servlet-uri-handling/a,b/c;d/e.f;g/h

Let’s take a look at the actual results…

getServerInfo getPathInfo getQueryString getRequestURI getRequestURL
Apache Tomcat/7.0.12 /a,b/c/e.f/h i=j+k&l=m /servlet-uri-handling/a,b/c;d/e.f;g/h http://localhost:8080/servlet-uri-handling/a,b/c;d/e.f;g/h
jetty/6.1.26 /a,b/c i=j+k&l=m /servlet-uri-handling/a,b/c;d/e.f;g/h http://localhost:8080/servlet-uri-handling/a,b/c;d/e.f;g/h
GlassFish Server Open Source Edition 3.0.1 /a,b/c i=j+k&l=m /servlet-uri-handling/a,b/c;d/e.f;g/h http://localhost:8080/servlet-uri-handling/a,b/c;d/e.f;g/h
WebLogic Server 10.3.4.0 /a,b/c i=j+k&l=m /servlet-uri-handling/a,b/c;d/e.f;g/h http://localhost:7001/servlet-uri-handling/a,b/c;d/e.f;g/h
Oracle Containers for J2EE 10g (10.1.3.5.0) /a,b/c i=j+k&l=m /servlet-uri-handling/a,b/c http://localhost:8888/servlet-uri-handling/a,b/c

Some observations:

  • As expected, none of the containers follow the wider spec of RFC 3986, treating any of the sub-delimiters as parameter markers, they only treat the semi-colon as the parameter marker.
  • All containers do recognize the presence of URI parameters
  • All containers discard any kind of URI parameter data when returning the value of getPathInfo(). Not sure why they do this, I can’t see anything in the Servlet specification telling them to do this.
  • Tomcat is the only container that understands that each segment of the path can have parameters. All the others assume the parameters can only appear on the last segment.
  • All containers bar OC4J will return the full path information (including context path, up to the query string) from getRequestURI().

Conclusion

If you need to handle URI parameters don’t use getPathInfo(), use getRequestURI() and parse out the information you need yourself, remember to URL decode if needed. If you’re tied to OC4J then you’re out of luck and there doesn’t seem to be a way to handle URI parameters.

References

REST and Oracle APEX Listener

Update: November 2011, This functionality is available in Oracle Application Express Listener 1.1 and later, note the syntax of bind variables has been changed to match that used in other Oracle products: {person} becomes :person. The examples below have been changed to reflect the updated syntax

Oracle Application Express Listener Early Adopter Release 1.1 introduces a feature called Resource Templates that enables quick and easy configuration of RESTful web services for exposing and manipulating data stored in an Oracle Database via HTTP.

Hello REST World

Resource Templates are a mechanism to bind a URI Template to an SQL Query or a PL/SQL Block. For example a Resource Template could bind the URI Template: hello?who={person} to the following SQL Query:

select 'Hello ' || :person || ' from Oracle Application Express Listener' greeting from dual

Note how the {person} parameter in the URI Template is bound to a similarly named variable (:person) in the SQL query. Suppose our Application Express Listener is deployed at: http://example.com/apex/ then a HTTP request like the following:

GET /apex/hello?who=World HTTP/1.1
Host: example.com

will generate a HTTP response like the following:

HTTP/1.1 200 OK
ETag: "vovmx3pCoUq/DcdTY/hXusiq1QU="
Content-Type: application/json

{
 "greeting": "Hello World from Oracle Application Express Listener"
}
  • When the path: hello?who=World is received by the Application Express Listener, it is matched against our Resource Template’s URI Template.
  • The concrete value World is used as the value for :person parameter
  • The results of the Resource Template’s SQL Query are transformed into a JSON document
  • The JSON document consists of a JSON object. Each column in the query result set is mapped to a property of the JSON object.

Entity Tags and Conditional Operations

Note the ETag header in the HTTP response of the previous example. An ETag is a version identifier for a resource, in other words whenever the content of a resource changes, the value of its ETag header also changes. Resource Templates can generate an ETag header as follows:

  • By generating a secure hash of the resource’s contents. This is the default mechanism, but requires buffering the content of the resource to generate the hash
  • Sometimes the resource or its constituent parts will already contain information to uniquely identify a version of a resource, so alternatively a SQL query may be defined that generates a unique version identifier. The result set of the SQL query is securely hashed and used as the ETag value. This option does not require buffering of the resource, but care needs to be taken to ensure the SQL query does guarantee a different result set for each resource version
  • No ETag. Naturally there is an option to suppress generation of an ETag header

Resource Templates have an automatic capability for handling HTTP Conditional Operations where a If-None-Match or If-Match header is included with a request. To follow on from the example in the previous section a HTTP request like the following:

GET /apex/hello?who=World HTTP/1.1
Host: example.com
If-None-Match: "vovmx3pCoUq/DcdTY/hXusiq1QU="

will generate a HTTP response like the following:

HTTP/1.1 304 Not Modified
  • Again when the path: hello?who=World is received by the Application Express Listener, it is matched against our Resource Template’s URI Template.
  • The results of the Resource Template’s SQL Query are transformed into a JSON document
  • A secure hash of the JSON document is generated and is compared to the value of the If-None-Match header
  • Since the values are identical the Application Express Listener generates a 304 Not Modified response and does not include the resource’s content in the response
  • So in the case where a resource representation has not changed since the last time a client retrieved it, the overhead of transmitting the resource’s contents from server to client is avoided.

Media Resources

Resource Templates can be used to generate any type of resource not just the default of JSON documents. Resource Templates support a special type of SQL query called a Media Resource Query. Media Resource Queries consist of an SQL Query that generate a single result row with two columns in the row. The first column denotes the resource’s MIME type, the second column contains the resource’s content. For example if we have a Resource Template with a URI Template of : customers/locations/{state} bound to the following query:

select 'application/vnd.google-earth.kml+xml', xmlquery('
<k:kml xmlns:k="http://www.opengis.net/kml/2.2">
 <k:Folder>
  <k:name>{$state}</k:name>
{ for $c in ora:view("oe","customers")/ROW
let $loc := $c/CUST_GEO_LOCATION/SDO_POINT
where $c/CUST_ADDRESS/STATE_PROVINCE = $state
return
   <k:Placemark>
    <k:name>{concat($c/CUST_FIRST_NAME," ",$c/CUST_LAST_NAME)}</k:name>
     <k:Point>
      <k:coordinates>{$loc/X/text()},{$loc/Y/text()},0</k:coordinates>
     </k:Point>
    </k:Placemark>
}
</k:Folder>
</k:kml>' passing upper(:state) as "state" returning content) from dual

Then a HTTP request like:

GET /apex/customers/locations/ny HTTP/1.1
Host: example.com

will generate a HTTP response like:

HTTP/1.1 200 OK
ETag: "P4eISUyr2BZwtN1VnQHYyV556wU="
Content-Type: application/vnd.google-earth.kml+xml

<kml xmlns="http://www.opengis.net/kml/2.2">
 <Folder>
  <name>NY</name>
  <Placemark>
   <name>Blake Seignier</name>
   <Point>
    <coordinates>-76.14607,43.106533,0</coordinates>
   </Point>
  </Placemark>
  ...
 </Folder>
</kml>
  • In this example we leverage the powerful XMLQuery SQL function to generate a standards compliant KML file, thus quickly making our customer data available to third party GIS applications

And There’s More…

The above gives a brief flavour of what Resource Templates can accomplish, but there’s lots more functionality, here’s a few highlights:

  • HTTP Headers can be bound to Queries/PLSQL blocks in a similar manner to URI Template parameters, allowing responses to vary depending on HTTP headers received (e.g. localizing the response based on the Accept-Language header
  • Similar support for HTML Forms enables form fields to be bound to Query/PLSQL parameters, making processing of HTML forms straightforward, including forms containing file upload fields
  • Extraction of HTML5 Microdata from HTML resources to enable production of an alternate JSON representation of the microdata

Try it out, Learn More

You can download the Early Adopter Release here. The Developer Guide provides all the information needed to get started with Resource Templates.

The latest version of Oracle Application Express Listener can be downloaded here. The Developer Guide provides all the information needed to get started with Resource Templates.

Your Feedback

This is an Early Adopter release, it is not a finished product, As ever we encourage you to try out Oracle Application Express Listener and tell us what you think in the Oracle Application Express Listener forum. We look forward to your feedback.