Saturday, March 20, 2010

Rails caching and cache busting in PHP

Ever wondered how to use browser caching to speed up your page loads?

I was working on a Rails project recently, and noticed something interesting in the documentation:

Using asset timestamps

By default, Rails appends asset‘s timestamps to all asset paths[1]. This allows you to set a cache-expiration date for the asset far into the future, but still be able to instantly invalidate it by simply updating the file (and hence updating the timestamp, which then updates the URL as the timestamp is part of that, which in turn busts the cache).

It‘s the responsibility of the web server you use to set the far-future expiration date on cache assets that you need to take advantage of this feature. Here‘s an example for Apache:

# Asset Expiration
ExpiresActive On
<filesmatch "\.(ico|gif|jpe?g|png|js|css)$">
ExpiresDefault "access plus 1 year"
</FilesMatch>

As I explained on Stackoverflow (more on that in a moment):

If you look at a the source for a Rails page, you'll see what they mean: the path to a stylesheet might be "/stylesheets/scaffold.css?1268228124", where the numbers at the end are the timestamp when the file was last updated.

So it should work like this:

1. The browser says 'give me this page'
2. The server says 'here, and by the way, this stylesheet called scaffold.css?1268228124 can be cached for a year - it's not gonna change.'
3. On reloads, the browser says 'I'm not asking for that css file, because my local copy is still good.'
4. A month later, you edit and save the file, which changes the timestamp, which means that the file is no longer called scaffold.css?1268228124 because the numbers change.
5. When the browser sees that, it says 'I've never seen that file! Give me a copy, please.' The cache is 'busted.'

Bringing it to PHP


Clever! Now how can we borrow that idea in a PHP app?

The first step, of course, is to set the server to tell browsers 'cache these files.' The example config above worked for me[2].

The second step is to append timestamps to your filenames. Here's a first-pass attempt at that:

<link rel="stylesheet" type="text/css" href="/includes/main.css <?PHP echo '?' . filemtime($root.'/includes/main.css'); ?>" title="default" />

That basically works - the time stamp is appended to the file name. But it's not nearly as streamlined as the Rails way, for a couple of reasons.

1) You have to type the file name twice - don't repeat yourself!
2) Come to think of it, all your stylesheet links are going to be the same format. Why keep typing in the boilerplate stuff?

In Rails, you'd just do this:

<%= stylesheet_link_tag 'main' %>

Slick! Helper tags like these take a lot of the drudgery out of HTML when you're using Rails.

A loose aproximation in PHP could be generalized to handle different file types. For example, you might write a function like this:

function cachedFile($type, $name, $attr=null){
 $root = $_SERVER['DOCUMENT_ROOT'];
 switch ($type){
  case 'css':
   $output = '<link rel="stylesheet" type="text/css" href="/includes/';
   $output .= $name;
   $output .= '?' . filemtime($root.'/includes/'. $name) . '" ';
   if($attr){
    $output .= $attr . ' ';
   }
   $output .= '/>';
   $output .= "\n";
   echo $output;
   break;
   case 'js':
    $output = '<script type="text/javascript" src="/includes/';
    $output .= $name;
    $output .= '?' . filemtime($root.'/includes/'. $name) . '"';
    $output .= '></script>';
    $output .= "\n";
    echo $output;
    break;
 }
}

...which could then be used like this:

cachedFile('css','jquery-ui-1.7.1.custom.css');
cachedFile('css','main.css','title="Default"');
cachedFile('js','jquery-1.4.min.js');

Notice that this function assumes something - that your javascript files and stylesheets will always be in a particular folder. That's part of Rails' "convention over configuration" mentality: if you always do something the same way, you only have to specify it once.

Now, there's still room for improvement. For example, the type could be extracted from the filename, so that's one less argument to pass in. And more file types could be added. But this function already accomplishes several good things:

1) It gets your files to be cached by the browser and to bust the cache when necessary
2) It cuts down on code repetition
3) Naming the function cachedFile makes its purpose obvious

Now - how can you verify that this is working? I had the same question myself.

As Andy on Stackoverflow pointed out, you can load your page in Firefox, use the Firebug add-on, and look in the "Net" panel as you load the page. For any file that's cached, you should see a status message of 304 Not Modified. For anything that's pulled from the server, you should see 200 OK.

Try it:

1) Load the page to request everything once
2) Reload it to verify that things are being cached
3) Make a trivial change to a cached file, so that its timestamp will change
4) Reload the page again and verify that it was requested
5) Reload the page one last time and verify that it's cached again
6) Set up an elaborate Rube Goldberg machine to pat yourself on the back

(Step 6 is optional.)

Great ideas are worth borrowing


One reason that Rails has become so popular is that it codifies a lot of clever ideas and best practices into easy-to-use shortcuts. You can make a whole app with Rails without ever realizing that it's pulling the trick shown here on your behalf.

You don't have to use Rails, but if you see a great idea, it's always worth asking: "can I borrow this?"

Now go bust come caches!

[1]There's a danger here: notice that the Rails docs say all asset paths. If you set Apache to tell the browser to cache all images, style sheets and scripts for a year, and you only use a cache busting strategy for some of those things, then your visitors won't see updated versions of the others unless they clear their own browser cache manually or do a hard refresh with Ctrl+F5.

[2]I put this information into Apache's main config file, httpd.conf. If you're using a web host, they probably don't give you access to that, but they may have configured Apache to look for .htaccess files in your project folders. If so, you can set caching rules there.