Thursday, March 25, 2010

Cache busting in PHP: Part 2

In my previous post, I showed how to borrow a technique from Ruby on Rails for busting the browser cache for a particular file.

If you haven't read that, please check it out and come back. It's OK, I'll wait here. I've got a snack.

Improving on cachedFile()


Back? OK, well I've made some improvements to cachedFile() and thought I'd share them. Here are the new capabilities:

1) The function now extracts the file type from the extension
2) It handles images, and specifies their dimensions for faster and smoother rendering by the browser
3) It caches all the information it calculates about a file for faster performance on subsequent requests

#1 and #2 are pretty straightforward: you can use the function like this:

cachedFile('foo.png');
cachedFile('subdirectory/bar.png','class="buz"')

...and it outputs something like this:

<img src="/images/foo.png?1241452378" width="16" height="15" />
<img src="/images/subdirectory/bar.png?1241452378" width="20" height="17" class="buz" />

I put a cache in your cache so you can cache while you cache


But what about #3? What's this caching business? How can we add caching to a caching function?

Let's back up a bit.

First off, cachedFile() was a bit of a misnomer. This function is really for BUSTING a cache.

1) First, we configured our web server to tell the browser "you can cache these types of files for a whole year - don't ask for them again."
2) Second, we made sure that the browser saw each filename as the combination of the ACTUAL filename, like 'foo.png', and the file's time stamp, resulting in 'foo.png?1241452378' (or something like that). Those numbers represent the last time the file was changed; they're the same time stamp you see on any file on your computer.
3) Third, since the time stamp is automatically pulled from the file, we verified that we can update the file, which will update the time stamp, which will trick the browser into thinking it's never seen that file before, and therefore requesting it again.

The end result: the browser asks for a file once, then never again (at least for a year) - until the moment you change the file. As soon as it's updated, the browser asks for a new copy; until then, it uses the one it cached.

So, instead of cachedFile(), we could have called the function browserCacheBuster(). (But we won't, because I think that sounds cheesy.)

Now, this is all great, but the server is doing a bit of work for each file. Like before, each time you ask our function for a file, it has to go and determine the time stamp. In addition, my new features mean that for image files, it has to compute the width and height of the image.

This is all very fast in human terms, but how will it scale? What if you're using cachedFile() to spit out the same image tag several hundred times on the same page?

In that case, it might be nice to remember what you calculated last time. "Foo.png? Oh yeah, I remember him. I wrote down his dimensions and time stamp right here. No need to calculate them again."[1]

Memoization


To make this happen, we're going to use a design pattern called memoization. It works like this:

1) Before you calculate a result or pull it from the database, see if you've already got that result stored in a cache
2) If not, figure out your result and store it in your cache. If so, skip this step.
3) Now you've verified that you've got it in your cache, so return it from there.

For a given input, the first time the function runs, it will check the cache, find nothing, calculate a result from the input, store the result in cache, and return. Every time after that, it will just check the cache, find a result for that input, and return.

Does it matter?


But is there any point in doing this? Are we prematurely optimizing? Maybe. Let's see how much performance gain this really gets us.

I did a little not-very-scientific testing: added some caching to cachedFile(), called it from a loop a few hundred times, and timed the results using PHP's microtime(). I tried this with js, css, and image files, and did five or ten iterations of each.

Not a great sample size, but here's what I found: for .js files, having a cache made the function 2.72 times faster. For .css files, it made it 3.18 times faster. But for image files, having a cache made the function 119.63 times faster!

Clearly, computing those image dimensions is a bit expensive for the server, and we don't want to do it more than necessary.[2] Caching cuts the workload considerably.

Enough talk - code time


OK, let's see how our function looks with these changes. (The cache is stored in a global variable so it will persist between function calls. To offset this minor sin, I have labeled it clearly and awkwardly to prevent accidental meddling from elsewhere.)

$GLOBAL_cachedFile_cache = null;
function cachedFile($name, $attr=null){
 global $GLOBAL_cachedFile_cache;
 if (!isset($GLOBAL_cachedFile_cache[$name])){
  $root = $_SERVER['DOCUMENT_ROOT'];
  $filetype = substr($name,strripos($name,'.')+1);

  /* Configuration options */
  $imgpath = '/images/';
  $csspath = '/stylesheets/';
  $jspath = '/scripts/';

  switch ($filetype){
   case 'css':
    $output = '<link rel="stylesheet" type="text/css" href="/includes/';
    $output .= $name;
    $output .= '?' . filemtime($root . $csspath . $name) . '" ';
    if($attr){
     $output .= $attr . ' ';
    }
    $output .= '/>' . "\n";
    break;
   case 'js':
    $output = '<script type="text/javascript" src="/includes/';
    $output .= $name;
    $output .= '?' . filemtime($root . $jspath . $name) . '"';
    $output .= '</script>' . "\n";
    break;
   case 'jpg':
   case 'gif':
   case 'png':
    //This code will get run in any of the three cases above
    $output = '<img src="' . $imgpath . $name;
    $output .= '?' . filemtime($root . $imgpath . $name) . '"';
    $imgsize = getimagesize($root . $imgpath . $name);
    $output .= ' ' . $imgsize[3];
    if($attr){
     $output .= ' ' . $attr;
    }
    $output .= ' />';
    break;
  }
  $GLOBAL_cachedFile_cache[$name] = $output;
 }
 echo $GLOBAL_cachedFile_cache[$name];
}

Magnanimousness


What's that? Want to use this code somewhere? Well, sure. No, you don't have to thank me, or license it, or anything. Just name your kid after me or send me a solid gold pickle.

Humility


And of course, perhaps I did something very stupid here. Well, that's what comments are for.


[1]You might worry if this will create problems. After all, if we cache the time stamp, won't we miss the fact that the file has been updated and defeat our purpose? No worries: the cache only lasts as long as the page script is running. So if you update a file while a user is loading the page, they won't see it. But on the next reload, they will.

[2]In fact, it would be reasonable not to do it at all; there are lots of factors in how fast a site performs and seems, but how quickly it renders is certainly one of them. This is meant to help with that, but costs processor speed. You'll have to decide what works best for your site.