//
notetoself, REST

Why trailing slashes on URIs are important

Origin of the trailing slash

In Unix, a trailing slash on a pathname identifies the path as pointing to a folder (aka directory). If a pathname does not have a trailing slash then it points to a file. A folder is a ‘collection’ of files.

The syntax of URIs is derived from the syntax of Unix filenames, and the concept of using trailing slashes to identify ‘collection’ resources was carried over. However on the Web, the strong delineation between folders and files does not exist, frequently a ‘collection’ resource appears similar in structure and content to a normal resource (sometimes referred to as a ‘subordinate’ resource).

As a consequence, much confusion has arisen about the purpose and importance of trailing slashes on collection resources. It is common for users to forget the trailing slash on a resource, and common for web-servers to assist users, when they make this mistake, to redirect them to the URI with the trailing slash automatically.

In fact this practice is so pervasive, that for the vast majority of users, a resource URI with or without a trailing slash is treated as a synonym. They are considered two URIs that point to the same resource, using either one is fine. However this understanding is not quite correct.

It is more correct to understand that the resource without the trailing slash does not exist at all. But instead of being unhelpful and reporting a 404 Not Found status, web-servers almost always apply Postel’s Law and tell the user where the resource they are looking for is actually located, via a permanent redirect.

Read on to learn why it is important to have the correct understanding when designing RESTful APIs.

Relative URIs

The main reason trailing slashes are so important, is that they are critical to relative URIs in your API functioning correctly. There are many reasons why you should want to use relative URIs, but let’s look at just a couple:

  • If your API is available over both HTTP and HTTPS, then which protocol should you specify in your fully qualified URI? You’ll need to make sure all the links in a generated response match the protocol used to request the resource, other wise problems with mixed content may be encountered.
  • In a multi-tiered architecture, where the application server generating a resource may be at some remove from the public endpoint that the consumer of an API accesses, determining the correct public URI to generate in a resource can become cumbersome. For example if you have a Java Servlet running on Apache Tomcat, fronted by Apache HTTPD and reverse proxy (mod_proxy directives), it takes a good bit of configuration for the Servlet to be able to determine the public request URL, and even then the Servlet needs to be aware that it is being fronted by mod_proxy, which is not good for encapsulation.
  • If for any reason the location at which your API resides changes (web-site re-branding, moving API to a distinct hostname for scaling/security reasons, etc. etc.), then all of the resources exposed by the API must be updated to use the new fully qualified URIs.
  • If your API does not allow for relative URIs properly, then it will, in turn, prevent users of your API using relative URIs in any content they store in your service. More on this below.

Everywhere the fully qualified URI is repeated, it must be changed if the URI ever needs to change. With relative URIs this problem is mitigated, the location of a resource is expressed relative to the location of the current resource, and the client is able to turn that relative location into an absolute location using URI resolution.

To be blunt, using fully qualified URIs, when a relative URI would suffice, is a violation of the Don’t Repeat Yourself (DRY) principle. By needlessly repeating the fully qualified URI, you create more work for yourself, if the fully qualified URI has to change for any reason. You can also create more work for the customers of your API, by potentially preventing them using relative URIs.

Doing it wrong

Let’s look at an example that neglects to use trailing slashes and attempts to use relative URIs, and as a consequence gets things wrong.

Assume we’ve noticed that there just aren’t enough blog engines in the world, and so we’ve created the greatest blog engine ever (GBEE), and we want to expose an API to it, so that the people who make those ‘sharing’ widgets have yet another API that they need to integrate with.

Here’s a rough idea of the API:

GET https://{blog-name}.gbee.io/blog/posts       # Retrieve the list of posts
GET https://{blog-name}.gbee.io/blog/posts/{id}  # Retrieve an individual post
GET https://{blog-name}.gbee.io/blog/images/{id} # Retrieve an image linked to in a post's content.

Here’s a sample of retrieving the list of blog posts:

GET /blog/posts HTTP/1.1
Host: some-blog.gbee.io

produces a listing like the following:

HTTP/1.1 200 OK
Content-Type: application/json

{
 "posts": [
  {
   "links": [{"href":"holiday"}],
   "tags": ["vacation", "mexico"],
   "summary": "
Had a great trip to Mexico recently,
here's a picture of where we stayed:
<img src="../images/hotel.jpg">
"
  },
  ...
 ]
}

We can see that the relative URI of the first blog post is holiday, so what is the correct absolute URI of the blog post?

You might expect it to be http://some-blog.gbee.io/blog/posts/holiday, but that’s not what the document above says. It actually says the absolute location is http://some-blog.gbee.io/blog/holiday. To understand why, you need to understand the algorithm for transforming relative URIs into absolute URIs.

The first step is to establish the base URI of a resource, in this case the base URI is the URI of the requested resource, i.e.: http://some-blog.gbee.io/blog/posts

The next step is to merge the base URI with the relative URI, RFC 3986 describes this process, the relevant statement is:

return a string consisting of the reference’s path component appended to all but the last segment of the base URI’s path (i.e., excluding any characters after the right-most “/” in the base URI path, or excluding the entire base URI path if it does not contain any “/” characters).

Therefore we must exclude any characters after the right-most “/” in http://some-blog.gbee.io/blogs/posts, which gives: http://some-blog.gbee.io/blogs/. Finally the relative URI (holiday) is appended to this URI, thus giving http://some-blog.gbee.io/blog/holiday.

If a client attempts to retrieve http://some-blog.gbee.io/blog/holiday they will get a 404 Not Found error, because the resource doesn’t actually exist at that location.

If it is not already clear, placing a trailing slash on a collection resource is not optional. It is critical to relative URIs being resolved correctly. RFC 3986 is one of the foundational specifications of the web and as the example above demonstrates, collection resources are expected to have a trailing slash. It’s not just a stylistic preference, or something that provides an SEO optimization. It’s intrinsic to the syntax of URIs and therefore, important to get right.

I’d speculate that a lack of understanding of how relative URIs are transformed into absolute URIs, is a major factor in the all the too common occurrence, of web APIs not naming collection resources correctly, and consequently using fully qualified URIs throughout the API unnecessarily. I think developers encounter problems trying to get relative URIs working properly (through their lack of understanding of the mechanics) and then err on the side of caution and switch to using fully qualified URIs throughout.

The above ‘broken’ API also provides a commonly seen problem where relative URIs seem to be working in one case, but not in another. If we were to retrieve content of the blog post it would look like this:

GET /blog/posts/holiday HTTP/1.1
Host: some-blog.gbee.io
HTTP/1.1 200 OK
Content-Type: text/html

<p>Had a great trip to Mexico recently,
here's a picture of where we stayed:
<img src="../images/hotel.jpg">
</p>

Nothing unusual about this content, just regular HTML, and the relative URI of the image link looks correct:

Base URI
http://some-blog.gbee.io/blog/posts/holiday
Relative URI
../images/hotel.jpg
Absolute URI
http://some-blog.gbee.io/blog/images/hotel.jpg

So the problem doesn’t lie here, again it lies with the /blog/posts resource. When the <img> tag is evaluated relative to /blog/posts, the wrong location is produced:

Base URI
http://some-blog.gbee.io/blog/posts
Relative URI
../images/hotel.jpg
Absolute URI
http://some-blog.gbee.io/images/hotel.jpg

You can easily imagine many developers scratching their heads trying to figure out why the image location is not being calculated correctly for the /blog/posts resource, when it is working fine for the /blog/posts/holiday resource. The fact that the base URI is that of the requesting document (the list of posts), not of the blog post itself is easily missed.

Preventing API users doing things right

It is also worth appreciating that the author of the blog post may have the reasonable expectation that they can use relative URIs, since they are an intrinsic part of the Web. Users will expect that the blog engine will fully support the use of relative URIs. By choosing not to put a trailing slash on the /blogs/posts resource, the blogging engine has failed to meet this expectation. The post author can reasonably view the blog engine as defective in this regard.

Doing it right

All of the problems outlined above can be addressed by placing a trailing slash at the end of the blog posts resource, so it’s URI becomes:


http://some-blog.gbee.io/blog/posts/

Now the relative path for the holiday relative URI resolves correctly:

Base URI
http://some-blog.gbee.io/blog/posts/
Relative URI
holiday
Absolute URI
http://some-blog.gbee.io/blog/posts/holiday

The relative path for the image in the blog post also resolves correctly when resolved relative to the /blog/posts/ resource:

Base URI
http://some-blog.gbee.io/blog/posts/
Relative URI
../images/hotel.jpg
Absolute URI
http://some-blog.gbee.io/blog/posts/holiday

Getting Query URIs right

Another mistake that I have sometimes seen, is to get naming of a collection resource correct, but to get the URI for queries on the collection wrong, e.g.:

http://some-blog.gbee.io/blog/posts/              # retrieve all posts
http://some-blog.gbee.io/blog/posts?tags=vacation # retrieve posts tagged with 'vacation'

The problem here, once again, is that relative URIs returned in the http://some-blog.gbee.io/blog/posts?tags=vacation resource will be resolved relative to http://some-blog.gbee.io/blog rather than http://some-blog.gbee.io/blog/posts/, because the trailing slash is missing.

The correct form of the URI would be:


http://some-blog.gbee.io/blog/posts/?tags=vacation

The HTML <base> element

Some resource formats specify ways to override the base URI of a resource. For example HTML has the <base> element. XML has the xml:base extension.

These mechanisms exist to provide a way for a HTML or XML document to render correctly when using embedded hyper-links that use relative URIs, regardless of whether the hosting document is correctly named with a trailing slash or not (or if the URI of the hosting document cannot be determined).

I only mention these for completeness, I would not recommend their use unless required to workaround an existing API that does not name collection resources correctly.

Rules of Thumb

URIs are hierarchial in nature. The path component of a URI is particularly hierarchal. Levels in the hierarchy are delimited by the slash (“/“) character. Paths to the right of a slash are subordinate to paths to the left of a slash:

/a/b # b is subordinate to a
/c/d/e # e is subordinate to d, d is subordinate to c

Any path which occurs to the left of a slash is a collection resource, any path which occurs to the right is a subordinate. Note that a resource can be both a collection resource and a subordinate resource (as shown by the /c/d/e example, d is both a collection resource and a subordinate resource).

If a resource is a collection resource (even if it is subordinate to another resource), then it’s URI must have a trailing slash. If you visualize the path hierarchy of your API as a tree, then only the leaf nodes in the tree should lack a trailing slash.

  • Only leaf resources (resources with no subordinates) should lack a trailing slash in their URI.
  • If a request is received without a trailing slash, then do a permanent redirect to the URI with the trailing slash.
  • Use relative paths wherever possible (DRY principle).
  • Remember to include the trailing slash in query URIs.
  • Don’t do anything that would prevent consumers of your API using relative URIs.
About these ads

Discussion

Comments are closed.

Twitter

The views expressed on this blog are my own and do not necessarily reflect the views of my employer.

Archives

Follow

Get every new post delivered to your Inbox.

Join 120 other followers