Farewell wordpress.com, time for a new blog

After 100 posts and approaching 8 years it’s time for me to close up this little old wordpress.com blog. For a long time now I’ve been itching to self host my blog, and I’ve finally found the time to pull the trigger on a new blog. From now on you can catch me on http://blog.cdivilly.com.

Advertisements

Why trailing slashes on URIs are important

Origin of the trailing slash

In Unix, a trailing slash on a pathname identifies the path as pointing to a folder (aka directory). If a pathname does not have a trailing slash then it points to a file. A folder is a ‘collection’ of files.

The syntax of URIs is derived from the syntax of Unix filenames, and the concept of using trailing slashes to identify ‘collection’ resources was carried over. However on the Web, the strong delineation between folders and files does not exist, frequently a ‘collection’ resource appears similar in structure and content to a normal resource (sometimes referred to as a ‘subordinate’ resource).

As a consequence, much confusion has arisen about the purpose and importance of trailing slashes on collection resources. It is common for users to forget the trailing slash on a resource, and common for web-servers to assist users, when they make this mistake, to redirect them to the URI with the trailing slash automatically.

In fact this practice is so pervasive, that for the vast majority of users, a resource URI with or without a trailing slash is treated as a synonym. They are considered two URIs that point to the same resource, using either one is fine. However this understanding is not quite correct.

It is more correct to understand that the resource without the trailing slash does not exist at all. But instead of being unhelpful and reporting a 404 Not Found status, web-servers almost always apply Postel’s Law and tell the user where the resource they are looking for is actually located, via a permanent redirect.

Read on to learn why it is important to have the correct understanding when designing RESTful APIs.

Relative URIs

The main reason trailing slashes are so important, is that they are critical to relative URIs in your API functioning correctly. There are many reasons why you should want to use relative URIs, but let’s look at just a couple:

  • If your API is available over both HTTP and HTTPS, then which protocol should you specify in your fully qualified URI? You’ll need to make sure all the links in a generated response match the protocol used to request the resource, other wise problems with mixed content may be encountered.
  • In a multi-tiered architecture, where the application server generating a resource may be at some remove from the public endpoint that the consumer of an API accesses, determining the correct public URI to generate in a resource can become cumbersome. For example if you have a Java Servlet running on Apache Tomcat, fronted by Apache HTTPD and reverse proxy (mod_proxy directives), it takes a good bit of configuration for the Servlet to be able to determine the public request URL, and even then the Servlet needs to be aware that it is being fronted by mod_proxy, which is not good for encapsulation.
  • If for any reason the location at which your API resides changes (web-site re-branding, moving API to a distinct hostname for scaling/security reasons, etc. etc.), then all of the resources exposed by the API must be updated to use the new fully qualified URIs.
  • If your API does not allow for relative URIs properly, then it will, in turn, prevent users of your API using relative URIs in any content they store in your service. More on this below.

Everywhere the fully qualified URI is repeated, it must be changed if the URI ever needs to change. With relative URIs this problem is mitigated, the location of a resource is expressed relative to the location of the current resource, and the client is able to turn that relative location into an absolute location using URI resolution.

To be blunt, using fully qualified URIs, when a relative URI would suffice, is a violation of the Don’t Repeat Yourself (DRY) principle. By needlessly repeating the fully qualified URI, you create more work for yourself, if the fully qualified URI has to change for any reason. You can also create more work for the customers of your API, by potentially preventing them using relative URIs.

Doing it wrong

Let’s look at an example that neglects to use trailing slashes and attempts to use relative URIs, and as a consequence gets things wrong.

Assume we’ve noticed that there just aren’t enough blog engines in the world, and so we’ve created the greatest blog engine ever (GBEE), and we want to expose an API to it, so that the people who make those ‘sharing’ widgets have yet another API that they need to integrate with.

Here’s a rough idea of the API:

GET https://{blog-name}.gbee.io/blog/posts       # Retrieve the list of posts
GET https://{blog-name}.gbee.io/blog/posts/{id}  # Retrieve an individual post
GET https://{blog-name}.gbee.io/blog/images/{id} # Retrieve an image linked to in a post's content.

Here’s a sample of retrieving the list of blog posts:

GET /blog/posts HTTP/1.1
Host: some-blog.gbee.io

produces a listing like the following:

HTTP/1.1 200 OK
Content-Type: application/json

{
 "posts": [
  {
   "links": [{"href":"holiday"}],
   "tags": ["vacation", "mexico"],
   "summary": "
Had a great trip to Mexico recently,
here's a picture of where we stayed:
<img src="../images/hotel.jpg">
"
  },
  ...
 ]
}

We can see that the relative URI of the first blog post is holiday, so what is the correct absolute URI of the blog post?

You might expect it to be http://some-blog.gbee.io/blog/posts/holiday, but that’s not what the document above says. It actually says the absolute location is http://some-blog.gbee.io/blog/holiday. To understand why, you need to understand the algorithm for transforming relative URIs into absolute URIs.

The first step is to establish the base URI of a resource, in this case the base URI is the URI of the requested resource, i.e.: http://some-blog.gbee.io/blog/posts

The next step is to merge the base URI with the relative URI, RFC 3986 describes this process, the relevant statement is:

return a string consisting of the reference’s path component appended to all but the last segment of the base URI’s path (i.e., excluding any characters after the right-most “/” in the base URI path, or excluding the entire base URI path if it does not contain any “/” characters).

Therefore we must exclude any characters after the right-most “/” in http://some-blog.gbee.io/blogs/posts, which gives: http://some-blog.gbee.io/blogs/. Finally the relative URI (holiday) is appended to this URI, thus giving http://some-blog.gbee.io/blog/holiday.

If a client attempts to retrieve http://some-blog.gbee.io/blog/holiday they will get a 404 Not Found error, because the resource doesn’t actually exist at that location.

If it is not already clear, placing a trailing slash on a collection resource is not optional. It is critical to relative URIs being resolved correctly. RFC 3986 is one of the foundational specifications of the web and as the example above demonstrates, collection resources are expected to have a trailing slash. It’s not just a stylistic preference, or something that provides an SEO optimization. It’s intrinsic to the syntax of URIs and therefore, important to get right.

I’d speculate that a lack of understanding of how relative URIs are transformed into absolute URIs, is a major factor in the all the too common occurrence, of web APIs not naming collection resources correctly, and consequently using fully qualified URIs throughout the API unnecessarily. I think developers encounter problems trying to get relative URIs working properly (through their lack of understanding of the mechanics) and then err on the side of caution and switch to using fully qualified URIs throughout.

The above ‘broken’ API also provides a commonly seen problem where relative URIs seem to be working in one case, but not in another. If we were to retrieve content of the blog post it would look like this:

GET /blog/posts/holiday HTTP/1.1
Host: some-blog.gbee.io
HTTP/1.1 200 OK
Content-Type: text/html

<p>Had a great trip to Mexico recently,
here's a picture of where we stayed:
<img src="../images/hotel.jpg">
</p>

Nothing unusual about this content, just regular HTML, and the relative URI of the image link looks correct:

Base URI
http://some-blog.gbee.io/blog/posts/holiday
Relative URI
../images/hotel.jpg
Absolute URI
http://some-blog.gbee.io/blog/images/hotel.jpg

So the problem doesn’t lie here, again it lies with the /blog/posts resource. When the <img> tag is evaluated relative to /blog/posts, the wrong location is produced:

Base URI
http://some-blog.gbee.io/blog/posts
Relative URI
../images/hotel.jpg
Absolute URI
http://some-blog.gbee.io/images/hotel.jpg

You can easily imagine many developers scratching their heads trying to figure out why the image location is not being calculated correctly for the /blog/posts resource, when it is working fine for the /blog/posts/holiday resource. The fact that the base URI is that of the requesting document (the list of posts), not of the blog post itself is easily missed.

Preventing API users doing things right

It is also worth appreciating that the author of the blog post may have the reasonable expectation that they can use relative URIs, since they are an intrinsic part of the Web. Users will expect that the blog engine will fully support the use of relative URIs. By choosing not to put a trailing slash on the /blogs/posts resource, the blogging engine has failed to meet this expectation. The post author can reasonably view the blog engine as defective in this regard.

Doing it right

All of the problems outlined above can be addressed by placing a trailing slash at the end of the blog posts resource, so it’s URI becomes:

http://some-blog.gbee.io/blog/posts/

Now the relative path for the holiday relative URI resolves correctly:

Base URI
http://some-blog.gbee.io/blog/posts/
Relative URI
holiday
Absolute URI
http://some-blog.gbee.io/blog/posts/holiday

The relative path for the image in the blog post also resolves correctly when resolved relative to the /blog/posts/ resource:

Base URI
http://some-blog.gbee.io/blog/posts/
Relative URI
../images/hotel.jpg
Absolute URI
http://some-blog.gbee.io/blog/posts/holiday

Getting Query URIs right

Another mistake that I have sometimes seen, is to get naming of a collection resource correct, but to get the URI for queries on the collection wrong, e.g.:

http://some-blog.gbee.io/blog/posts/              # retrieve all posts
http://some-blog.gbee.io/blog/posts?tags=vacation # retrieve posts tagged with 'vacation'

The problem here, once again, is that relative URIs returned in the http://some-blog.gbee.io/blog/posts?tags=vacation resource will be resolved relative to http://some-blog.gbee.io/blog rather than http://some-blog.gbee.io/blog/posts/, because the trailing slash is missing.

The correct form of the URI would be:

http://some-blog.gbee.io/blog/posts/?tags=vacation

The HTML <base> element

Some resource formats specify ways to override the base URI of a resource. For example HTML has the <base> element. XML has the xml:base extension.

These mechanisms exist to provide a way for a HTML or XML document to render correctly when using embedded hyper-links that use relative URIs, regardless of whether the hosting document is correctly named with a trailing slash or not (or if the URI of the hosting document cannot be determined).

I only mention these for completeness, I would not recommend their use unless required to workaround an existing API that does not name collection resources correctly.

Rules of Thumb

URIs are hierarchial in nature. The path component of a URI is particularly hierarchal. Levels in the hierarchy are delimited by the slash (“/“) character. Paths to the right of a slash are subordinate to paths to the left of a slash:

/a/b # b is subordinate to a
/c/d/e # e is subordinate to d, d is subordinate to c

Any path which occurs to the left of a slash is a collection resource, any path which occurs to the right is a subordinate. Note that a resource can be both a collection resource and a subordinate resource (as shown by the /c/d/e example, d is both a collection resource and a subordinate resource).

If a resource is a collection resource (even if it is subordinate to another resource), then it’s URI must have a trailing slash. If you visualize the path hierarchy of your API as a tree, then only the leaf nodes in the tree should lack a trailing slash.

  • Only leaf resources (resources with no subordinates) should lack a trailing slash in their URI.
  • If a request is received without a trailing slash, then do a permanent redirect to the URI with the trailing slash.
  • Use relative paths wherever possible (DRY principle).
  • Remember to include the trailing slash in query URIs.
  • Don’t do anything that would prevent consumers of your API using relative URIs.

Disable Firefox redirecting to localhost.com

Filed under ‘Annoying things I keep forgetting how to fix…’:

Out of the box, Firefox has a feature designed to help people mis-typing URLs in the browser bar. It’s described in detail here, but briefly, if a URL fails to resolve, Firefox trys a couple of permutations of the URL to try find what you –might have– really intended, appending a .com and/or prefixing a www. to the host name portion of the URL to see if they resolve.

This feature was introduced way back in the early versions of Firefox, and has been annoying developers ever since. The feature means that often-times when a server running on localhost fails to respond, Firefox decides to try localhost.com and/or www.localhost.com.

Now once you’ve fixed the problem with the server, you hit the reload button on Firefox, but it still doesn’t work, but this time it’s because Firefox has changed your URL to point at this bogus localhost.com. Annoying!!!

To disable this ‘feature’ do the following:

  • Type about:config in the browser bar
  • Type browser.fixup.alternate.enabled in the search box that appears
  • Right click on the browser.fixup.alternate.enabled that appears in the filtered list below and choose Toggle to set the value to false
  • Problem fixed, breath out.

Cisco AnyConnect Client and the “Unable to process response from…” Error Message

If you get an error message saying: “Unable to process response from…” with the Cisco AnyConnect VPN client on Ubuntu, then the underlying problem is that the client is attempting to use the https_proxy environment variable to resolve the HTTP/S proxy to use.

Not sure why exactly this causes a problem but the way to workaround this problem is to unset the https_proxy variable just for the process launching the VPN client. For example I’ve added a script named: vpnui to my ~/bin with the following contents:

#!/bin/sh
unset https_proxy
/opt/cisco/vpn/bin/vpnui

Then I changed the Applications|Internet|Cisco AnyConnect VPN Client menu shortcut to point to this script instead, and voila, problem solved.

Oracle auto generated columns and insert … returning into

Everyone knows Oracle doesn’t have auto generated columns, so what am I talking about? I really mean columns whose value is generated by a trigger. The most common example being to simulate the auto-generated id columns functionality found in many other databases. In Oracle instead of defining a column as having an auto generated numeric value you have to define a sequence, define an insertion trigger that selects the next sequence value and uses it for the inserted id columns value, something like the following:

create table generated_ids (
 id   number constraint pk primary key,
 name varchar2(255) not null
)
/
create sequence ids_seq start with 1;
/
create or replace trigger  bi_generated_ids
    before insert on generated_ids
    for each row
begin
  :new.id := nvl(:new.id,ids_seq.nextval);
end;
/
show errors
/

This works fine, you can insert into the table in the manner shown below, and the trigger will generate a value for the id column:

insert into generated_ids (name) values ('whatever');

Very often however, you want to know the value of the id column that was chosen by the trigger. You could do something like this:

declare
 l_id number;
begin
 insert into generated_ids (name) values ('whatever');
 select id into l_id from generated_ids where name = 'whatever';
 dbms_output.put_line(l_id);
end;

The above will work IFF there is a unique constraint on the name column, it won’t work reliably if not, there could be more than one row with the value ‘whatever’. Anyways having to perform two statements (the insert and then the select) seems very heavyweight for what is a very common need, there must be a better way.

The better way to do this is to use the returning into clause with the insert statement, like so:

declare
 l_id number;
begin
 insert into generated_ids (name) values ('whatever') returning id into l_id;
 dbms_output.put_line(l_id);
end;

That’s a bit less verbose and more robust at the same time. It’s worth emphasising that this technique is not just limited to auto-generated ids, it can be applied to any column value that may be generated or modified by a trigger.

Anatomy of a PermGen Memory Leak

What is the PermGen?

The Permanent Generation (PermGen) Heap is a heap in the JVM dedicated to storing the JVM’s internal representation of Java Classes (and also interned String instances).

In the Oracle JVM, the heap is a fixed size, if an attempt is made to load more class instances than will fit in this area then a OutOfMemoryError is thrown, even when there is still plenty of space on the other heaps.

You can increase the PermGen heap size with the -XX:MaxPermSize flag, but typically increasing the heap size is just putting of the inevitable, instead of addressing the root cause.

A memory leak, in Java? Yup, it happens. When something holds onto a reference to an object instance, it cannot be garbage collected, eventually if enough errant references are held, or the graph of other references onwards from the errant reference is large, the GC cannot free enough memory and the heap is exhausted.

Every Java Object holds a reference to it’s own java.lang.Class object and every java.lang.Class in turn holds a reference to the class loader that instantiated it. Each class loader holds a reference to every class that it has instantiated. That is potentially a very big graph of references.

Here’s the point: If an object instantiated by one class loader holds a reference to an object in another class loader, then the latter object cannot be garbage collected until that reference is given up, which means it’s class object cannot be garbage collected, which means it’s class loader cannot be garbage collected, which means none of the classes instantiated by that class loader can be garbage collected.

So you can see even one errant reference can cause a lot of memory wastage, particularly if the target of the reference happens to have been instantiated by a class loader that instantiated a lot of classes.

The Tomcat wiki enumerates just some of the ways this can happen, and this great presentation by Mark Thomas goes into more detail on this topic. In fact the problems are so prevalent that many application containers already include logic to try address these leaks themselves.

Finding the errant reference

So you’ve got a PermGen error and you want to find the root cause, let’s work through a Tomcat specific example to get a flavour for how to go about tackling this kind of problem.

Every web application in Tomcat gets it’s own class loader instance, this is to prevent one application interfering with another. However each class loader will always share a single class loader parent, the boot class loader, which is responsible for loading all of the JRE classes.

If an object instantiated by the boot class loader were to hold a reference to an object instantiated by the webapp’s class loader then that would lead to the scenario described above, where the webapp’s class loader would remain referenced and therefore ineligible for garbage collection. Eventually after the nth redeploy of the web-application a PermGen java.lang.OutOfMemoryError error will occur, as the class loader for each deployment of the application remains in memory until the heap is exhausted.

We need to hunt down these web application class loader instances, and figure out what is holding on to them.

Java Visual VM

This needle in a haystack hunt used to be quite painful, but since JDK6 (Update 7 or later) we have a great tool that ships with the JDK that makes things much easier: VisualVM.

VisualVM is a graphical tool that can connect to any JVM, allow you to take a dump of the JVM’s heap and then let you navigate that heap.

In Tomcat, the class loader for a web application is a class named:org.apache.catalina.loader.WebappClassLoader. If our Tomcat instance has only one web application deployed, then there should be only be one instance of this class in the heap. If there’s more than one, then we have a leak. Let’s use VisualVM to check this out:

Start Visual VM by typing:

${JAVA_HOME}/bin/jvisualvm

You’ll a screen similar to the following:

Notice in the left hand pane, a list of current JVM processes is shown, right click Tomcat in the list and choose ‘Heap Dump’.

You’ll now see something similar to the following:

Click the ‘OQL Console’ button. This displays a console that allows you to query the heap dump. We want to find all instances of : org.apache.catalina.loader.WebappClassLoader.
Enter the following in the Query Editor pane:

select x from org.apache.catalina.loader.WebappClassLoader x

Press Execute and you will something similar to the following:

In this case, VisualVM found two instances of the web application classloader, one for the application itself and one for the Tomcat manager application.

Use the Tomcat manager application to restart the web application (http://localhost:8080/manager/html), and take another heap dump of the Tomcat process.
Navigate back to the OQL Console and enter the same query again:

Notice how there are now 3 instances, we have a leak, one of these 3 instances should have been garbage collected but wasn’t, which one is it?

In Tomcat there is an easy way to tell, active class loaders have a field named: started which is set to true. Click through each class loader instance until you find the one whose started field is false.

Nearly there, we’ve found the class loader instance that has leaked, now we need to determine what object is holding a reference to the class loader. Lots of objects will be holding references to the class loader, and in turn each of those objects will be referenced by many other objects, but ultimately there will be one or a very few objects that will form the root of this graph of references, that’s the object(s) we are interested in.

So in the bottom pane, right click on the this reference and select ‘Show Nearest GC Root’. You’ll see something like the following:

Right click on the instance, and select ‘Show Instance’

We can see that this is an instance of the sun.awt.AppContext type. You can see that the contextClassLoader field in AppContext is holding a reference to the WebappClassLoader. This is the errant reference that is causing the memory leak.

sun.awt.AppContext type is not a class I’m familiar with myself, time to figure out who/what is instantiating it.

Restart your tomcat in debug mode, I use:

export JPDA_SUSPEND=y
${TOMCAT_HOME}/bin/catalina.sh jpda

Now we need to remote debug the class loading sequence, I’m going to use Eclipse to do this. We need to set a class load breakpoint on sun.awt.AppContext:

  • Use the Open Type Prompt (Shift+Control+T) to locate the sun.awt.AppContext type.
  • Right click on the class name in the Outline pane, and choose ‘Toggle Class Load Breakpoint’

Now we need to trigger the class loading sequence. Connect your debugger to the Tomcat instance, and the debugger should stop at the point where sun.awt.AppContext class is loaded:

Aha, it’s been instantiated by the JavaBeans framework, which in this instance is being used by the Oracle Universal Connection Pool (UCP). We can also notice that the contextClassLoader is a final field and it looks like AppContext is a singleton, so we can infer that this field is set once and once only during the instantiation of AppContext.

So we have established that it’s code within our application that is provoking the leak. Now we need to figure out a cure.

At application startup, we need to assure that when sun.awt.AppContext is initialized WebappClassLoader is not the current context class loader. Something like the following should work:

 //somewhere in application startup, e.g. the ServletContextListener
 try {
  final ClassLoader active = Thread.currentThread().getContextClassLoader();
  try {
   //Find the root classloader
   ClassLoader root = active;
   while (root.getParent() != null) {
    root = root.getParent();
   }
   //Temporarily make the root class loader the active class loader
   Thread.currentThread().setContextClassLoader(root);
   //Force the AppContext singleton to be created and initialized
   sun.awt.AppContext.getAppContext();
  } finally {
   //restore the class loader
   Thread.currentThread().setContextClassLoader(active);   
  }
 } catch ( Throwable t) {
  //Carry on if we get an error
  LOG.warning("Failed to address PermGen leak");
 }

I added this code to my servlet context listener, causing it to execute during application start-up and it had the desired effect, curing this particular memory leak.

Summary

In JEE Applications, the root cause of a PermGen out of memory errors usually lie in the application itself (or a library used by the application), often compounded by classes in the JRE library holding references to the web application class loader or objects instantiated by the web application class loader.

The process for finding the root cause of a leak is to use a heap analyzer like VisualVM to find the uncollected web application class loader instance, and then find the root GC object that is directly or indirectly holding onto the class loader. When you find this object, use your debugger to discover how this object is being instantiated and then devise a way to modify it’s behaviour so that it does not continue to hold on to the class loader reference forever.

Weblogic & Basic Auth

Weblogic will by default attempt to authenticate any HTTP Basic credentials, even if the URI being accessed does not fall within a statically declared web.xml security constraint. Doesn’t seem like a reasonable default to me, but anyways there is a means to change this behaviour, the enforce-valid-basic-auth-credentials setting:

To set the enforce-valid-basic-auth-credentials flag, perform the following steps:

  1. Add the <enforce-valid-basic-auth-credentials> element to config.xml within the <security-configuration> element.
    ...
    <enforce-valid-basic-auth-credentials>false</enforce-valid-basic-auth-credentials>
    </security-configuration>
    ...
  2. Start or restart all of the servers in the domain.

It’s a shame there isn’t an equivalent setting in the weblogic.xml deployment descriptor.

How to fix Syntax Highlighting in Trac 0.11

Trac 0.11 and later uses Pygments to do syntax highlighting, IFF Pygments is installed. However even after installing Pygments, I could not get syntax highlighting to work.

While trying to figure this out I noticed the following error in my Firebug Console:

jQuery.loadStyleSheet is not a function

plus a couple of 404 errors (can’t remember the exact urls).

This reminded of a warning I’d seen on the trac website:

Important note: Please use either version 1.6, 2.4 or later of mod_wsgi. Versions prior to 2.4 in the 2.X branch have problems with some Apache configurations that use WSGI file wrapper extension. This extension is used in Trac to serve up attachments and static media files such as style sheets. If you are affected by this problem attachments will appear to be empty and formatting of HTML pages will appear not to work due to style sheet files not loading properly.

So I quickly upgraded to the latest mod_wsgi and restarted my apache, and lo, no more errors and syntax highlighting is working properly.

Music|Albums|All Songs|Shuffle

This has been bugging me for ages. My preferred way to listen to my music collection is on Shuffle mode on my iPod Touch. I do this by choosing Music|Albums|All Songs|Shuffle. Within my Music Collection are lots of songs that I have more than one version off, live versions, soundtrack versions, compilation versions etc, and its been driving me crazy that Shuffle was playing each version in sequence. Why would I want to listen to the same song twice? That ain’t random!

I guess I’m slow, but I finally understood why this is happening today. This Shuffle menu option is not choosing songs randomly, instead it progresses through my collection in alphabetical order, just starting at a random point in the collection.

To truly shuffle the play order you need to tap the album artwork and then tap the shuffle icon that appears on the right hand side. Maybe this icon got switched off at some point, but I don’t recall doing that, I had to go hunting in the manual today to find out it even existed.

IMHO its a bug that choosing Music|Albums|All Songs|Shuffle doesn’t force this shuffle icon on.